val_bpb
1.1422
Architecture
Transformer
Optimizer
—
Artifact Size
15.8MB
Training Techniques
Quantization
int5
bits: 5
scope: MLP
Architecture
MLP
Int5-MLP recipe with 10 layers
parameters: {"layers":10}
Evaluation
sliding window eval
parameters: {"stride":64}
Compression
zstd
level: null
Novel Contributions
- Rerun of 10L Int5-MLP recipe on 4xH100 GPUs with built-in gradient accumulation
- Use of 1200-second wallclock cap instead of standard 10-minute run
- Final artifact compressed with zstd
- Submission targeted for non-record track, not main 8xH100 leaderboard