CVE-Bench: testing LLM agents on real-world vulnerability patches
Signal
65
Hype
15
In three linesCVE-Bench is a benchmark for evaluating LLM agents on real-world vulnerability patches. The study tests models' ability to identify and fix security flaws in existing code.Read source
Your take?
Summary generated by Claude — human-verified