Go back to feed

Chain of Thought Monitorability

Good read on model safety, but doesn't feel easy when you put it side by side with Anthropic’s report on CoT faithfulness —not just because CoT monitorability is fragile, but also because efforts to make CoT more faithful didn’t really move the needle. And then there’s Coconut (continuous latent space reasoning), which doesn’t give human-readable CoT at all. Seems like some reductionist approaches—like the deeper behavioral analysis Goodfire does—are still essential

Featured Mini-post

[
]

The Future of Energy Production

Our research

Cloudflare: A New Social Contract for the Web?

Our research

What OpenAI Wants and Where Things Went Wrong

Our research

How AI Conquered the US Economy

Our research

Simulate the world - Deepmind Researchers unveil Genie3

Our research

Foundation models can't really learn continuously - or can they?

Our research
See all mini-posts