arXiv cs.LG·2 June 2026

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Signal

Hype

In three linesBudgetDraft trains a sparse drafter for speculative decoding in long-context inference (4K-16K tokens). The method exposes the model to multiple KV budgets during training and aligns each sparse view with a shared full-cache teacher target. Results: 6.55x, 4.46x, 2.10x speedup vs autoregressive decoding at 4K, 8K, 16K tokens.

Read source

Your take?

Reasoning Benchmarks Infrastructure

Summary generated by Claude — human-verified

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Other angles on this story