PR #1460
openRecord: SP8192 + TTT + Eval-Time Hash Embedding — val_bpb 1.08269 (3-seed mean)
by resouerView on GitHub
val_bpb
1.0827
Architecture
Transformer
Optimizer
SGD
Artifact Size
~15.99 MB
Training Techniques
Architecture
BigramHash
Zero-initialized eval-time embedding keyed by a bigram hash is added to token embeddings before RMSNorm and trained during TTT.
parameters: {"vocab_size":16384,"embedding_dim":512,"hash_mod":16384,"hash_multiplier":2039}
depth recurrence
Layers 4-5 use a recurrent loop applied twice.
parameters: {"layers":[4,5],"repeats":2}
KV head count
Uses grouped key/value heads in the SP8192 stack.
parameters: {"heads":8,"kv_heads":4}
XSA
Applies XSA across all layers in the SP8192 architecture.
parameters: {"layers":11}
U-Net skip connections
Parallel residual / skip-gated connections are used in the architecture.
parameters: {"layers":[7,8,9,10]}
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"momentum":0.9,"epochs_per_chunk":3,"chunk_size":32000,"freeze":0}
Optimizer
SGD
weight_decay: null
momentum: 0.9
other_params: {"lr":0.005}
LR Schedule
cosine decay
parameters: null
Quantization
GPTQ
bits: 6
scope: all
Compression
lzma
level: null
Novel Contributions
- Eval-time hash embedding trained from zeros during score-first TTT
- Bigram-hash residual memory added before RMSNorm
- Record 3-seed mean val_bpb of 1.08269
- SP8192 stack combining parallel residuals, depth recurrence, skip gates, and compressed artifact packaging