PR #1779

open

Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421

by leon2k2k2kView on GitHub
val_bpb
1.0642
Architecture
Transformer
Optimizer
Artifact Size
15,976,535 B

Training Techniques

Architecture
Gated Attention
Full-dimension attention gate with quantized passthrough.
parameters: null
depth recurrence
Looped recurrence across layers 3-5 with two loops.
parameters: {"layers":[3,4,5],"loops":2}
Quantization
int8
bits: 8
scope: attention gate passthrough
Other
other
SP8192 CaseOps reversible case normalization operators (TITLE/ALLCAPS/CAPNEXT/ESC).
parameters: null
other
Frozen learned recurrent alpha/beta cross-layer blend scalars trained to convergence and then serialized as constants.
parameters: {"recur_alpha_enabled":1,"num_loops":2}
Test-Time Training
LoRA TTT
parameters: {"warm_start_A":true,"alpha":144,"weight_decay":1,"phases":3,"prefix_docs":2000}

Novel Contributions

  • Frozen recurrent alpha/beta cross-layer blend scalars trained to convergence and then frozen
  • LoRA TTT improvements with warm-start A, alpha=144, and weight decay 1.0
  • Combination of SP8192 CaseOps with gated attention and looped depth recurrence
  • Phased score-first TTT under the competition time and artifact constraints