Back to feed
arXiv cs.LG·

Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training

Signal
72
Hype
15
In three linesSystematic study of HiF8 W8A8 QAT on OpenPangu-Embedded-1B. Identifies two failure modes: amax saturation (silent corruption via clipping) and catastrophic forgetting (aggressive learning rate overwrites knowledge). Solutions: 64-step history window for DTS and 500-step BF16 warmup. Results: 0.43% MMLU drop, 0.58% HellaSwag, 0.22% ARC-Challenge vs baseline.
Read source
Your take?
Fine-tuningBenchmarksPapers

Summary generated by Claude — human-verified