PR #1730

open

Non-record: QK4 Legal TTT Reproduction (1.08449 BPB)

by N10ELabsView on GitHub
val_bpb
1.0845
Architecture
Transformer
Optimizer
Artifact Size
15,985,765 bytes

Training Techniques

Quantization
GPTQ
bits: 4
scope: model weights
Weight Averaging
EMA
parameters: {"decay":0.997}
Evaluation
sliding window eval
parameters: {"enabled":true}
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":3,"chunk_tokens":32768,"batch_seqs":32,"freeze_blocks":0,"eval_stride":64}
Sequence Length
sequence_length
train_length: 786432
eval_length: 524288
Regularization
weight decay
parameters: null
Other
other
QK gain initialization and matrix clipping used in the SP8192 training stack
parameters: {"qk_gain_init":4,"matrix_clip_sigmas":12.86}

Novel Contributions

  • Non-record reproduction of an 8xH100 SP8192 QK4 legal score-first TTT run
  • Legal TTT evaluation adapted from the April 6 QK5 record with score-first ordering
  • End-to-end reproduction of training, GPTQ/SDClip export, sliding-window validation, and legal TTT under the artifact cap
  • Provided reproducible scripts, metadata, and logs for the SP8192 record-family milestone