PR #416

open

Add non-record 11L XSA4 EMA run (val_bpb 1.12296, over 16MB)

by kshitizz36View on GitHub
val_bpb
1.1230
Architecture
11L XSA4
Optimizer
Artifact Size
20,906,280 bytes

Training Techniques

Architecture
XSA4
Uses an XSA4 model variant with 11 layers.
parameters: {"layers":11}
Weight Averaging
EMA
parameters: null
Quantization
int6
bits: 6
scope: all
Evaluation
sliding window eval
parameters: {"stride":64}
Compression
zstd
level: null

Novel Contributions

  • 11-layer XSA4 model run
  • EMA-weighted checkpoint
  • Int6 quantized submission
  • Sliding-window exact evaluation with stride 64
  • Non-record run that exceeds the 16MB submission limit