Back to feed
arXiv cs.CL·

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

Signal
82
Hype
18
In three linesFinAuditing is a financial auditing benchmark built from 1,102 real XBRL instances (33k tokens average). It evaluates 13 LLMs on three tasks: Financial Semantic Matching, Financial Relationship Extraction, and Financial Mathematical Reasoning. Results reveal substantial gaps in concept retrieval and cross-document reasoning.
Read source
Your take?
BenchmarksReasoningEvals

Summary generated by Claude — human-verified