Infini-News: Efficiently Queryable Access to 1.3 Billion Processed Common Crawl News Articles
Signal
75
Hype
15
In three linesInfini-News indexes 1.35B CC-News articles (August 2016–present) with metadata extraction, language detection (GlotLID, lingua, CommonLingua), and geographic attribution (83.4% coverage). Infini-gram suffix-array indexes enable sub-second full-text pattern search across the entire archive.Read source
Your take?
Summary generated by Claude — human-verified