PR #1729

open

Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean)

by romeerpView on GitHub
val_bpb
1.0678
Architecture
Transformer
Optimizer
Muon
Artifact Size
~15.94 MB

Training Techniques

Other
other
Lossless case-operations tokenizer / dataset export that factorizes text into lowercase lexical tokens plus capitalization side-channel controls, while preserving exact byte-level recoverability.
parameters: {"transform":"lossless_caps_caseops_v1","controls":["TITLE","ALLCAPS","CAPNEXT","ESC"]}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"late_taper":true,"wd_taper_start_frac":0.7,"wd_taper_final_mult":0.5}
Regularization
weight decay
parameters: {"tapered":true,"start_frac":0.7,"final_mult":0.5}
Test-Time Training
score-first TTT
parameters: {"phases":3}

Novel Contributions

  • Lossless CaseOps tokenizer / dataset export hosted publicly on Hugging Face
  • Validation byte-sidecar accounting to preserve exact raw-byte BPB scoring
  • Mild late Muon weight-decay taper
  • Improved pretrained, quantized, and post-TTT BPB while staying under the artifact cap