Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training
Signal
72
Hype
15
In three linesSystematic study of HiF8 W8A8 QAT on OpenPangu-Embedded-1B. Identifies two failure modes: amax saturation (silent corruption via clipping) and catastrophic forgetting (aggressive learning rate overwrites knowledge). Solutions: 64-step history window for DTS and 500-step BF16 warmup. Results: 0.43% MMLU drop, 0.58% HellaSwag, 0.22% ARC-Challenge vs baseline.Read source
Your take?
Summary generated by Claude — human-verified