Open-world evaluations for measuring frontier AI capabilities
Signal
65
Hype
35
In three linesCRUX is a new evaluation project for measuring frontier AI capabilities on long, messy open-world tasks, moving beyond traditional benchmarks.Read source
Your take?
Summary generated by Claude — human-verified