PR #2052
openNon-record: TTT-disabled SP1024 GPT control (1xH100, val_bpb 1.4442)
by tenet-diverView on GitHub
val_bpb
1.4442
Architecture
Transformer
Optimizer
—
Artifact Size
10,200,510 bytes
Training Techniques
Test-Time Training
score-first TTT
parameters: {"enabled":false}
Initialization
OrthoInit
QK gain initialization set to 5.25
Quantization
int8
bits: 8
scope: model artifact
Compression
zlib
level: null
Novel Contributions
- TTT-disabled runtime control for a score-first autoregressive GPT
- Reproducible non-record comparison point under the artifact cap
- Int8 + zlib packaging of the final model artifact
- QK gain initialization set to 5.25
- Single-H100 packaged run with one seed