arXiv cs.LG·26 May 2026

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

Signal

Hype

In three linesPromptAudit evaluates how prompting strategies affect LLM-based vulnerability detection. Across 5 open-weight models and 1,000 CVEs (6,074 samples), standard chain-of-thought achieves strongest performance, while few-shot provides model-dependent gains. Adaptive chain-of-thought suppresses recall; self-consistency induces excessive abstention.

Read source

Your take?

Prompt engineering Evals AI safety Code generation

Summary generated by Claude — human-verified

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

Other angles on this story