PR #1756
openRecord: CaseOps Tokenizer + Recurrence Depth Curriculum + Base Arch Stack — val_bpb 1.06505
by romeerpView on GitHub
val_bpb
1.0651
Architecture
Transformer
Optimizer
—
Artifact Size
~15.98 MB
Training Techniques
Architecture
depth recurrence
Looped recurrent block trained with a deterministic recurrence-depth curriculum and evaluated at fixed depth 4.
parameters: {"train_depths":[1,3,4],"eval_depth":4}
Gated Attention
Uses gated attention / quant-gate components from the inherited stack.
parameters: null
weight tying
Inherited base stack includes tied/shared components from prior work.
parameters: null
Test-Time Training
score-first TTT
parameters: {"phased":true}
Evaluation
fixed-depth eval
parameters: {"depth":4}
Other
other
CaseOps bijective capitalization transform with original-byte sidecar accounting for BPB on raw UTF-8 bytes.
parameters: null
Novel Contributions
- Deterministic 1 -> 3 -> 4 recurrence-depth curriculum after loop activation
- Fixed-depth-4 evaluation and phased TTT
- CaseOps tokenizer with original-byte sidecar BPB accounting
- Improved 3-seed mean val_bpb to 1.06505