PR #2052

open

Non-record: TTT-disabled SP1024 GPT control (1xH100, val_bpb 1.4442)

by tenet-diverView on GitHub
val_bpb
1.4442
Architecture
Transformer
Optimizer
Artifact Size
10,200,510 bytes

Training Techniques

Test-Time Training
score-first TTT
parameters: {"enabled":false}
Initialization
OrthoInit
QK gain initialization set to 5.25
Quantization
int8
bits: 8
scope: model artifact
Compression
zlib
level: null

Novel Contributions

  • TTT-disabled runtime control for a score-first autoregressive GPT
  • Reproducible non-record comparison point under the artifact cap
  • Int8 + zlib packaging of the final model artifact
  • QK gain initialization set to 5.25
  • Single-H100 packaged run with one seed