Back to feed
arXiv cs.CL·

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

Signal
72
Hype
18
In three linesRobustness study of Document Layout Analysis (DLA) pipelines used in RAG and long-document QA. Authors identify footprint bias and propose a lightweight auditing framework measuring block-level structural loss (B-SLR). On 1,000 pages with MinerU and PP-StructureV3, B-SLR correlates better with OCR instability (R²=0.727/0.916) than area-based metrics (R²=0.384/0.110).
Read source
Your take?
PapersEvalsRAGBenchmarks

Summary generated by Claude — human-verified