PR #602

open

Add non-record 4xH100 10L Int5-MLP submission

by ReNothinggView on GitHub
val_bpb
1.1422
Architecture
Transformer
Optimizer
Artifact Size
15.8MB

Training Techniques

Quantization
int5
bits: 5
scope: MLP
Architecture
MLP
Int5-MLP recipe with 10 layers
parameters: {"layers":10}
Evaluation
sliding window eval
parameters: {"stride":64}
Compression
zstd
level: null

Novel Contributions

  • Rerun of 10L Int5-MLP recipe on 4xH100 GPUs with built-in gradient accumulation
  • Use of 1200-second wallclock cap instead of standard 10-minute run
  • Final artifact compressed with zstd
  • Submission targeted for non-record track, not main 8xH100 leaderboard