Back to feed
arXiv cs.CL·

HalluScore: Large Language Model Hallucination Question Answering Benchmark

Signal
72
Hype
18
In three linesHalluScore is an Arabic QA benchmark with 827 curated questions to evaluate LLM hallucinations. Empirical analysis of 17 Arabic and multilingual models shows hallucinations extend beyond factual errors to cultural understanding, linguistic reasoning, and logical consistency challenges.
Read source
Your take?
BenchmarksEvalsAI safetyAlignment

Summary generated by Claude — human-verified