val_bpb
1.1194
Architecture
Transformer
Optimizer
Parallel Muon
Artifact Size
15,990,006 bytes
Training Techniques
Optimizer
Parallel Muon
weight_decay: null
momentum: null
other_params: null
Test-Time Training
full TTT
parameters: {"enabled":true}
Evaluation
stride-based eval
parameters: {"stride":64}
Sequence Length
sequence_length
train_length: null
eval_length: null
Compression
lzma
level: null
Architecture
weight tying
Promoted script is byte-identical to a proven record script; no explicit architecture change beyond the referenced submission setup is described.
parameters: null
Novel Contributions
- Promoted reviewer-ready submission folder with audited evidence logs
- Legal TTT submission package with canonical metric legal_ttt
- Parallel Muon optimizer usage
- Byte-identical promoted train_gpt.py inherited from a proven prior record
- Included train.log alias plus three audited seed logs
- Submission metadata generated from audit payload