PR #2049

open

[Non-record] PR #1855 + SWA blend — val_bpb 1.06140 (3-seed)

by IshanG97View on GitHub
val_bpb
1.0614
Architecture
Transformer
Optimizer
Artifact Size
15.91 MB

Training Techniques

Quantization
GPTQ
bits: 4
scope: model weights
Weight Averaging
EMA + SWA
parameters: {"swa_enabled":true,"swa_window_frac":0.05,"swa_interval":50,"swa_blend":0.5}
Compression
pergroup
level: null
Test-Time Training
full TTT
parameters: null

Novel Contributions

  • Blends SWA into the EMA shadow before GPTQ
  • Shows SWA composes with PR #1855's LQER + per-group pipeline
  • Measures a hardware-calibration shift on the target Runpod host
  • Provides a 3-seed reproducibility check with per-seed variance