PR #1482
openRecord: SP8192 + Pre-Quant TTT (QK 5.25, 8ep, freeze-1) — val_bpb 1.0787 (3-seed mean)
by aamodbhattView on GitHub
val_bpb
1.0787
Architecture
Transformer
Optimizer
—
Artifact Size
16,000,000 bytes
Training Techniques
Quantization
GPTQ
bits: null
scope: model weights
Architecture
depth recurrence
Uses the SP8192 recurrence pipeline as part of the model stack.
parameters: null
Test-Time Training
full TTT
parameters: {"epochs":8,"learning_rate":0.00045,"freeze_blocks":1}
Evaluation
sliding window eval
parameters: {"stride":64}
Regularization
weight decay
parameters: null
Novel Contributions
- SP8192 pre-quant TTT lane with tuned QK gain initialization
- Test-time training with 8 epochs, learning rate 0.00045, and freezing 1 block
- 3-seed confirmation of improved sliding-window validation bpb
- Use of sliding window evaluation with stride 64