val_bpb
1.0614
Architecture
Transformer
Optimizer
—
Artifact Size
15.91 MB
Training Techniques
Quantization
GPTQ
bits: 4
scope: model weights
Weight Averaging
EMA + SWA
parameters: {"swa_enabled":true,"swa_window_frac":0.05,"swa_interval":50,"swa_blend":0.5}
Compression
pergroup
level: null
Test-Time Training
full TTT
parameters: null
Novel Contributions
- Blends SWA into the EMA shadow before GPTQ
- Shows SWA composes with PR #1855's LQER + per-group pipeline
- Measures a hardware-calibration shift on the target Runpod host
- Provides a 3-seed reproducibility check with per-seed variance