Back to feed
arXiv cs.LG·

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Signal
78
Hype
15
In three linesBudgetDraft trains a sparse drafter for speculative decoding in long-context inference (4K-16K tokens). The method exposes the model to multiple KV budgets during training and aligns each sparse view with a shared full-cache teacher target. Results: 6.55x, 4.46x, 2.10x speedup vs autoregressive decoding at 4K, 8K, 16K tokens.
Read source
Your take?
ReasoningBenchmarksInfrastructure

Summary generated by Claude — human-verified