Follow feeds: blogs, news, RSS and more. An effortless way to read and digest content of your choice.
Get Feederalignmentforum.org
Get the latest updates from AI Alignment Forum directly as they happen.
Follow now 23 followers
Last updated 1 day ago
4 days ago
ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation...
8 days ago
As AI models become increasingly capable and autonomous, keeping them safely aligned...
9 days ago
We’d like to develop training techniques that work when applied to future...
10 days ago
Behavioral evaluations may become worthless, which we think would be a disaster...
10 days ago
This is a somewhat technical note By "software-only singularity", I mean that,...
15 days ago
I am going to talk about my experience in the Jane Street...
17 days ago
Most evaluations of AI systems focus on their capabilities: how good they...
22 days ago
Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from...
22 days ago
We have developed some relatively general methods for mechanistic estimation competitive with...
23 days ago
1) The safe-to-dangerous shift is a fundamental problem for eval realismSuppose we...
26 days ago
1.1 Tl;drAlignment is often conceptualized as AIs helping humans achieve their goals...