Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training
Systematic study of HiF8 W8A8 QAT on OpenPangu-Embedded-1B. Identifies two failure modes: amax saturation (silent corruption via clipping) and catastrophic forgetting (aggressive learning rate overwrites knowledge). Solutions: 64-step history window for DTS and 500-step BF16 warmup. Results: 0.43% MMLU drop, 0.58% HellaSwag, 0.22% ARC-Challenge vs baseline.