PR #1736
openRecord: SP8192 + CaseOps + GatedAttn + QuantGate + Loop45 + PhasedTTT — val_bpb 1.06549
by dexhunterView on GitHub
val_bpb
1.0655
Architecture
Transformer
Optimizer
—
Artifact Size
15,975,120 bytes
Training Techniques
Architecture
Gated Attention
Adds a learned attention out-gate with quant-gate scaling to recover overhead from control tokens.
parameters: {"init_std":0.005}
depth recurrence
Loop4-5 repeats layers 4 and 5 twice.
parameters: {"layers":[4,5],"repeats":2}
Test-Time Training
score-first TTT
parameters: {"phased":true,"num_phases":3,"prefix_docs":2000,"per_doc_lora_reset":true}
Other
other
CaseOps bijective case preprocessing that normalizes capitalization into operator tokens while preserving invertibility.
parameters: {"name":"lossless_caps_caseops_v1"}
other
Per-token byte sidecar used to score BPB on original pre-transform UTF-8 bytes.
parameters: null
other
Quant-gate scaling to offset artifact overhead from added control tokens and sidecar path.
parameters: null
Novel Contributions
- CaseOps bijective case preprocessing with operator tokens
- Per-token byte sidecar for scoring BPB on original UTF-8 bytes
- Learned gated attention out-gate with quant-gate scaling
- Loop4-5 depth recurrence
- Phased score-first TTT with per-document LoRA reset
- Record submission achieving 1.06549 val_bpb under size and time budgets