Back to feed
arXiv cs.CL·

Experience-Driven Dynamic Exits for LLMs with Reinforcement Learning

Signal
78
Hype
22
In three linesLEDE, an offline reinforcement learning framework, optimizes LLM inference by dynamically selecting exit layer and speculation length based on local sequence context. On Llama-2 and Llama-3, it achieves 2.0×–2.7× speedup over autoregressive decoding, +17% over static speculative baselines.
Read source
Your take?
LlamaReinforcement learningCode generationBenchmarks

Summary generated by Claude — human-verified