val_bpb
1.0802
Architecture
Transformer
Optimizer
—
Artifact Size
15,990,341 bytes
Training Techniques
Test-Time Training
full TTT
parameters: {"learning_rate":0.005,"epochs":3}
Other
other
SP8192-based run with QK_GAIN_INIT=5.30 on H100x8 using a Python 3.11 wrapper
parameters: {"seed":43,"world_size":8}
Compression
brotli
level: null
Novel Contributions
- Python 3.11-compatible wrapper packaging for H100 run
- QK_GAIN_INIT=5.30 run configuration
- Legal TTT submission candidate based on the public SP8192 record stack
- Artifact packaged under the 16MB limit