PR #1681

open

GDN-Hybrid: Fix warmdown/SWA/QAT timing — 1.0208 BPB

val_bpb
1.0208
Architecture
Hybrid
Optimizer
Artifact Size
15.03 MB

Training Techniques

LR Schedule
warmdown
parameters: {"iterations":2200,"warmdown_iters":400,"start_step":1800}
Weight Averaging
SWA
parameters: {"start_step":2100,"checkpoints":3}
Quantization
late QAT
bits: null
scope: null
GPTQ
bits: null
scope: null
Compression
custom
level: null

Novel Contributions

  • Fixes a timing mismatch that prevented warmdown, SWA, and late QAT from activating
  • Adjusts training iterations and warmdown iterations so all three training systems trigger
  • Fixes an SWA device mismatch bug between CPU and CUDA tensors
  • Improves quantized BPB to 1.0208 by correcting the training pipeline