val_bpb
1.0208
Architecture
Hybrid
Optimizer
—
Artifact Size
15.03 MB
Training Techniques
LR Schedule
warmdown
parameters: {"iterations":2200,"warmdown_iters":400,"start_step":1800}
Weight Averaging
SWA
parameters: {"start_step":2100,"checkpoints":3}
Quantization
late QAT
bits: null
scope: null
GPTQ
bits: null
scope: null
Compression
custom
level: null
Novel Contributions
- Fixes a timing mismatch that prevented warmdown, SWA, and late QAT from activating
- Adjusts training iterations and warmdown iterations so all three training systems trigger
- Fixes an SWA device mismatch bug between CPU and CUDA tensors
- Improves quantized BPB to 1.0208 by correcting the training pipeline