PR #1779

open

Record: SP8192 + CaseOps + Gated Attention + Quant Gate + Loop4-5 + Phased TTT + Frozen Recurrent Alpha — val_bpb 1.06421

val_bpb

1.0642

Architecture

Transformer

Optimizer

—

Artifact Size

15,976,535 B

Training Techniques

Architecture

Gated Attention

Full-dimension attention gate with quantized passthrough.

parameters: null

depth recurrence

Looped recurrence across layers 3-5 with two loops.

parameters: {"layers":[3,4,5],"loops":2}

Quantization

int8

bits: 8

scope: attention gate passthrough

Other

other

SP8192 CaseOps reversible case normalization operators (TITLE/ALLCAPS/CAPNEXT/ESC).

parameters: null

other

Frozen learned recurrent alpha/beta cross-layer blend scalars trained to convergence and then serialized as constants.

parameters: {"recur_alpha_enabled":1,"num_loops":2}

Test-Time Training

LoRA TTT

parameters: {"warm_start_A":true,"alpha":144,"weight_decay":1,"phases":3,"prefix_docs":2000}

Frozen recurrent alpha/beta cross-layer blend scalars trained to convergence and then frozen
LoRA TTT improvements with warm-start A, alpha=144, and weight decay 1.0
Combination of SP8192 CaseOps with gated attention and looped depth recurrence
Phased score-first TTT under the competition time and artifact constraints