PR #1177

open

review: Rerun of PR #1120 (Rascal) on 8xH100 SXM

by dexhunterView on GitHub
val_bpb
1.1098
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Evaluation
sliding window eval
parameters: null
Quantization
GPTQ
bits: null
scope: final artifact
Compression
zstd
level: null

Novel Contributions

  • Independent rerun of PR #1120 on 8xH100 SXM
  • Reports rerun metrics versus published seed 300 result
  • Clarifies that the submitted train_gpt.py computes pre-quantization final_sliding_window_exact on an unquantized model
  • Notes that int6+zstd quantization and final_int6_roundtrip metrics appear to come from an external runner