PR #1518
openRecord: Wider Loop + Per-Pass Embeddings + Tap-In V6 + Legal TTT (1.078825 3-seed mean)
by abaybektursunView on GitHub
val_bpb
1.0788
Architecture
Transformer
Optimizer
—
Artifact Size
15,977,457 bytes
Training Techniques
Architecture
depth recurrence
Wider loop recurrence with 3 passes through 3 loop blocks instead of 4 passes through 2.
parameters: {"LOOP_START":3,"LOOP_END":5,"NUM_LOOPS":2,"passes":3,"loop_blocks":3}
loop embeddings
Per-pass learned loop embeddings, zero-initialized and fired at the start of each pass.
parameters: {"num_embeddings":3,"dimension":512,"init":"zero"}
Regularization
Hessian clipping
parameters: {"lambda":0}
Evaluation
Tap-In V6 cross-window
parameters: {"bigram_idf_rule":true,"cross_window":true}
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"freeze_blocks":0,"epochs":3,"chunk_tokens":32768}
Quantization
int6
bits: 6
scope: model
Novel Contributions
- Wider depth recurrence with more loop block executions
- Per-pass learned loop embeddings
- Pinning Hessian clip lambda to 0 after a failed default value
- Tap-In V6 cross-window evaluation with bigram-IDF matching
- Legal score-first test-time training stacked on Tap-In V6