PR #1713
openNon-record submission: baseline_sp1024, val_bpb=1.3479(on single H100), AbhiShet108
by AbhiShet108View on GitHub
val_bpb
1.3479
Architecture
Transformer
Optimizer
—
Artifact Size
14,672,726 bytes
Training Techniques
Other
other
Increased MATRIX_LR from 0.04 to 0.08 to test learning-rate sensitivity on a single H100.
parameters: {"MATRIX_LR":0.08}
Novel Contributions
- Learning-rate sensitivity experiment with MATRIX_LR doubled from 0.04 to 0.08
- Single-H100 non-record baseline run for the 10min/16MB track
- Documentation of compute and storage constraints affecting training