BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding
Signal
78
Hype
15
In three linesBudgetDraft trains a sparse drafter for speculative decoding in long-context inference (4K-16K tokens). The method exposes the model to multiple KV budgets during training and aligns each sparse view with a shared full-cache teacher target. Results: 6.55x, 4.46x, 2.10x speedup vs autoregressive decoding at 4K, 8K, 16K tokens.Read source
Your take?
Summary generated by Claude — human-verified