FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs
Signal
82
Hype
18
In three linesFinAuditing is a financial auditing benchmark built from 1,102 real XBRL instances (33k tokens average). It evaluates 13 LLMs on three tasks: Financial Semantic Matching, Financial Relationship Extraction, and Financial Mathematical Reasoning. Results reveal substantial gaps in concept retrieval and cross-document reasoning.Read source
Your take?
Summary generated by Claude — human-verified