PR #1925
openRecord candidate: CaseOps + Matrix-LR 0.028 + Phased TTT 3500
by simon-marcusView on GitHub
val_bpb
1.0611
Architecture
Transformer
Optimizer
Muon
Artifact Size
15.90 MB
Training Techniques
Architecture
CaseOps
CaseOps SP8192 tokenizer and byte-sidecar path with lossless caps reserved tokenizer.
parameters: {"vocab_size":8192}
XSA
11-layer 512d XSA stack with U-Net skips, parallel decoder, depth recurrence, SparseAttnGate, BOS-fixed SmearGate, and LeakyReLU(0.5)^2 MLP.
parameters: {"layers":11,"dimensions":512}
U-Net skip connections
Uses U-Net style skip connections in the stack.
parameters: null
depth recurrence
Includes recurrent depth structure in the model.
parameters: null
SmearGate
BOS-fixed SmearGate is used in the attention/stack design.
parameters: null
LeakyReLU
Uses LeakyReLU(0.5)^2 MLP activation.
parameters: {"negative_slope":0.5}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"backend_steps":5,"variant":"Polar-Express Newton-Schulz"}
Quantization
GPTQ
bits: 6
scope: matrices
mixed int7/int8
bits: null
scope: embeddings and row gate
LQER
bits: null
scope: asymmetric rank-4 correction
Compression
pergroup lrzip + brotli
level: null
Test-Time Training
score-first TTT
parameters: {"phased":true,"prefix_docs":3500,"num_phases":3,"chunk_size":48,"lora_rank":80}
LR Schedule
warmdown
parameters: {"warmdown_frac":0.85,"warmup_steps":20}
Regularization
weight decay
parameters: {"value":0.5}
Novel Contributions
- Final push on the CaseOps/LQER/SmearGate stack while keeping the #1855 architecture intact.
- Raised MATRIX_LR from 0.026 to 0.028.
- Increased PHASED_TTT_PREFIX_DOCS to 3500 to use more of the eval budget.
- Score-first phased TTT on the post-quant model, evaluating each chunk before adaptation.
- Validated a 3-seed record-candidate run with mean val_bpb 1.06109 under the 16 MB artifact limit.