Everything you care about in one place

Follow feeds: blogs, news, RSS and more. An effortless way to read and digest content of your choice.

Get Feeder

alignmentforum.org

AI Alignment Forum

Get the latest updates from AI Alignment Forum directly as they happen.

Follow now 18 followers

Latest posts

Last updated about 12 hours ago

Defining Monitorable and Useful Goals

about 12 hours ago

Published on July 15, 2025 11:06 PM GMTIn my most recent post...

Principles for Picking Practical Interpretability Projects

about 17 hours ago

Published on July 15, 2025 5:38 PM GMTThanks to Neel Nanda and...

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

about 18 hours ago

Published on July 15, 2025 4:23 PM GMTTwitter | Paper PDFSeven years...

Recent Redwood Research project proposals

1 day ago

Published on July 14, 2025 10:27 PM GMTPreviously, we've shared a few...

Narrow Misalignment is Hard, Emergent Misalignment is Easy

1 day ago

Published on July 14, 2025 9:05 PM GMTAnna and Ed are co-first...

Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

2 days ago

Published on July 14, 2025 2:52 PM GMTThis is a write-up of...

Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings

3 days ago

Published on July 13, 2025 7:54 PM GMTThis post is a companion...

Linkpost: Guide to Redwood's writing

6 days ago

Published on July 10, 2025 6:39 PM GMTI wrote a guide to...

The bitter lesson of misuse detection

6 days ago

Published on July 10, 2025 2:50 PM GMTTL;DR: We wanted to benchmark...

Evaluating and monitoring for AI scheming

6 days ago

Published on July 10, 2025 2:24 PM GMTAs AI models become more...

White Box Control at UK AISI - Update on Sandbagging Investigations

6 days ago

Published on July 10, 2025 1:37 PM GMTJordan Taylor*, Connor Kissane*, Sid...

What's worse, spies or schemers?

7 days ago

Published on July 9, 2025 2:37 PM GMTHere are two problems you’ll...