PR #901
openrecord: 10L d496 WarmDown3500 SWA — val_bpb 1.1590 (1xH100 proxy)
by Hilo-HiloView on GitHub
val_bpb
1.1590
Architecture
Transformer
Optimizer
—
Artifact Size
15.94 MB
Training Techniques
Weight Averaging
SWA
parameters: {"start_frac":0.4,"every":50}
LR Schedule
warmdown
parameters: {"warmdown_steps":3500}
Evaluation
stride-based eval
parameters: {"stride":64}
Test-Time Training
TTT
parameters: null
Quantization
int6
bits: 6
scope: model
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Environment-only tuning with stock train_gpt.py and no code changes
- Reduced model dimension to 496 to fit under the 16MB artifact limit
- Extended warmdown schedule to 3500 iterations
- Used SWA with a 0.4 start fraction and 50-step averaging interval
- Disabled TTT to keep evaluation fast
- Reported a 1xH100 proxy result for an unverified 8xH100 configuration