val_bpb
1.2240
Architecture
Transformer
Optimizer
—
Artifact Size
15844118 bytes
Training Techniques
Architecture
tied embeddings
Input and output embeddings are tied, with the tied embedding kept at higher precision in the record snapshot.
parameters: null
Other
other
Reduced MLP hidden size to 960 to stay under the 16MB cap.
parameters: {"mlp_hidden":960}
Compression
zlib
level: null
Novel Contributions
- Uses an 8xH100 Modal single-node torchrun setup with a 600s wallclock cap
- Keeps tied embeddings at higher precision in the record snapshot
- Reduces MLP hidden size to 960 to fit under the 16MB submission cap
- Stores the final artifact as int8+zlib compressed model plus code