RSS

AI Snake Oil

AI coding agents do not replace software engineers. Despite expectations, these tools remain auxiliary technologies limited by integration challenges, reliability issues, and complex context handling.

Code generation AI Agents

SIG

HYP

AI Snake Oil·Apr 16

Open-world evaluations for measuring frontier AI capabilities

CRUX is a new evaluation project for measuring frontier AI capabilities on long, messy open-world tasks, moving beyond traditional benchmarks.

Evals Benchmarks

SIG

HYP

AI Snake Oil·Feb 24

New Paper: Towards a science of AI agent reliability

A new paper investigates AI agent reliability by quantifying the gap between claimed capabilities and actual performance. The study proposes methods to measure this divergence and improve the robustness of agent systems.

AI Agents Evals AI safety

SIG

HYP

AI Snake Oil·Feb 12

AI Won’t Automatically Make Legal Services Cheaper

AI will not automatically reduce legal services costs. The article applies the 'AI as Normal Technology' framework to the legal sector, questioning the assumption that AI automation will systematically drive down prices.

Business Regulation

SIG

HYP

AI Snake Oil·Jan 29

Fact checking Moravec's paradox

Critique of Moravec's paradox relevance, the famous claim that tasks easy for humans are hard for AI and vice versa. The article questions the validity and usefulness of this principle in current context.

Reasoning Evals

SIG

HYP

AI Snake Oil·Sep 9

A guide to understanding AI as normal technology

Article positioning AI as normal technology rather than revolutionary. Challenges dominant hype discourse and proposes a more nuanced perspective on actual capabilities and limitations of current systems.

AI safety Alignment

SIG

HYP

AI Snake Oil·Jul 16

Could AI slow science?

Article questioning whether AI could slow science by creating a production-progress paradox: increased publication volume without proportional improvement in quality or genuine scientific understanding.

Papers Evals AI safety

SIG

HYP

AI Snake Oil·May 1

AGI is not a milestone

The article challenges the notion that AGI represents a capability threshold triggering sudden impacts. It questions the staged progression model toward general intelligence.

Reasoning Alignment AI safety

SIG

HYP

AI Snake Oil·Dec 18

Is AI progress slowing down?

Analysis of recent AI technology trends to assess whether progress is slowing. Examines current technological claims and their empirical foundation.

Benchmarks Evals

SIG

HYP

AI Snake Oil·Dec 13

We Looked at 78 Election Deepfakes. Political Misinformation is not an AI Problem.

Analysis of 78 election deepfakes: political misinformation is not primarily an AI problem. Electoral manipulation issues predate the technology and cannot be solved by technical solutions alone.

AI safety Regulation Video generation

SIG

HYP

AI Snake Oil·Nov 11

Does the UK’s liver transplant matching algorithm systematically exclude younger patients?

The UK's liver transplant matching algorithm may systematically exclude younger patients. Seemingly minor technical decisions can have life-or-death effects.

Alignment AI safety Regulation

SIG

HYP

AI Snake Oil·Sep 18

Can AI automate computational reproducibility?

A new benchmark measures AI's ability to automate computational reproducibility in science. The study assesses the impact of AI models on improving scientific result reproduction practices.

Benchmarks Papers Evals

SIG

HYP

AI Snake Oil·Aug 19

AI companies are pivoting from creating gods to building products. Good.

AI companies are shifting from AGI rhetoric to building concrete products. The article identifies five major challenges in this transition: monetization, user integration, inference costs, technical differentiation, and regulatory compliance.

Business Regulation

SIG

HYP

AI Snake Oil·Jul 26

AI existential risk probabilities are too unreliable to inform policy

Critique of AI existential risk probability estimates presented as quantified. The article denounces how speculation is laundered through pseudo-quantification to influence policy, lacking solid empirical grounding.

AI safety Alignment Regulation

SIG

HYP

AI Snake Oil·Jul 3

New paper: AI agents that matter

Critical article on AI agent evaluation. Questions current benchmarking methods and proposes rethinking what makes a meaningful AI agent.

AI Agents Evals Benchmarks

SIG

HYP

AI Snake Oil·Jun 27

AI scaling myths

The article challenges scaling myths in AI, asserting that model growth will hit limits. The timing of this saturation remains uncertain.

Reasoning Benchmarks

SIG

HYP

AI Snake Oil·Jun 3

Scientists should use AI as a tool, not an oracle

Scientists must treat AI as a tool, not an infallible oracle. AI hype leads to flawed research that fuels more hype, creating a vicious cycle.

AI safety Alignment Evals

SIG

HYP

AI Snake Oil·Apr 30

AI leaderboards are no longer useful. It's time to switch to Pareto curves.

Traditional AI leaderboards are becoming obsolete as cost-performance tradeoffs grow complex. The article advocates replacing leaderboards with Pareto curves to evaluate AI agents, showing how $2,000 in spending reveals true efficiency-resource compromises.

Evals AI Agents Benchmarks

SIG

HYP

AI Snake Oil — AI feed · Signal IA