PR #1729

RECORDopen

Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean)

by romeerpView on GitHub

val_bpb

1.0678

Architecture

Transformer

Optimizer

Muon

Artifact Size

~15.94 MB

Training Techniques

Other

other

Lossless case-operations tokenizer / dataset export that factorizes text into lowercase lexical tokens plus capitalization side-channel controls, while preserving exact byte-level recoverability.

parameters: {"transform":"lossless_caps_caseops_v1","controls":["TITLE","ALLCAPS","CAPNEXT","ESC"]}

Optimizer

Muon

weight_decay: null

momentum: null

other_params: {"late_taper":true,"wd_taper_start_frac":0.7,"wd_taper_final_mult":0.5}

Regularization

weight decay

parameters: {"tapered":true,"start_frac":0.7,"final_mult":0.5}

Test-Time Training

score-first TTT

parameters: {"phases":3}

Novel Contributions

Lossless CaseOps tokenizer / dataset export hosted publicly on Hugging Face
Validation byte-sidecar accounting to preserve exact raw-byte BPB scoring
Mild late Muon weight-decay taper
Improved pretrained, quantized, and post-TTT BPB while staying under the artifact cap