val_bpb
1.1098
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Evaluation
sliding window eval
parameters: null
Quantization
GPTQ
bits: null
scope: final artifact
Compression
zstd
level: null
Novel Contributions
- Independent rerun of PR #1120 on 8xH100 SXM
- Reports rerun metrics versus published seed 300 result
- Clarifies that the submitted train_gpt.py computes pre-quantization final_sliding_window_exact on an unquantized model
- Notes that int6+zstd quantization and final_int6_roundtrip metrics appear to come from an external runner