PR #2035

open

Add SP8192 QK-Gain 5.30 Python 3.11 H100 TTT submission

by claramuseView on GitHub
val_bpb
1.0802
Architecture
Transformer
Optimizer
Artifact Size
15,990,341 bytes

Training Techniques

Test-Time Training
full TTT
parameters: {"learning_rate":0.005,"epochs":3}
Other
other
SP8192-based run with QK_GAIN_INIT=5.30 on H100x8 using a Python 3.11 wrapper
parameters: {"seed":43,"world_size":8}
Compression
brotli
level: null

Novel Contributions

  • Python 3.11-compatible wrapper packaging for H100 run
  • QK_GAIN_INIT=5.30 run configuration
  • Legal TTT submission candidate based on the public SP8192 record stack
  • Artifact packaged under the 16MB limit