PR #1736

RECORDopen

Record: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549

val_bpb

1.0655

Architecture

Transformer

Optimizer

—

Artifact Size

15,975,120 bytes

Training Techniques

Architecture

Gated Attention

Adds a learned attention out-gate with quant-gate scaling to recover overhead from control tokens.

parameters: {"init_std":0.005}

depth recurrence

Loop4-5 repeats layers 4 and 5 twice.

parameters: {"layers":[4,5],"repeats":2}

Test-Time Training

score-first TTT

parameters: {"phased":true,"num_phases":3,"prefix_docs":2000,"per_doc_lora_reset":true}

Other

other

CaseOps bijective case preprocessing that normalizes capitalization into operator tokens while preserving invertibility.

parameters: {"name":"lossless_caps_caseops_v1"}

other

Per-token byte sidecar used to score BPB on original pre-transform UTF-8 bytes.

parameters: null

other

Quant-gate scaling to offset artifact overhead from added control tokens and sidecar path.

parameters: null