val_bpb
1.0483
Architecture
Transformer
Optimizer
—
Artifact Size
15,972,854 bytes
Training Techniques
Architecture
SmearGate
Widened the sparse attention gate reader from the baseline gate window to 32 in the #2018 stack.
parameters: {"gate_window":32,"smear_gate_window":12}
BigramHash
Added a tiny causal BigramHash input feature branch with small-tensor routing.
parameters: {"vocab_size":512,"dimensions":4,"bits":6}
Path-A-v3
Applied a small Path-A-v3-style routing variant alongside the BigramHash branch.
parameters: {"small":true}
Other
other
q-aware token-only n-gram tilt path used during evaluation/training pipeline.
parameters: {"token_only":true,"q_aware":true}
Test-Time Training
score-first TTT
parameters: {"phased":true}
Sequence Length
sequence_length
train_length: null
eval_length: 2048
Evaluation
sliding window eval
parameters: {"stride":12}
Compression
brotli/lrzip
level: null
Novel Contributions
- Documented a failed transfer of Gate32 to the #2018 frontier stack.
- Documented a failed transfer of a tiny BigramHash branch to the #2018 stack.
- Validated that the q-aware token-only n-gram patch was not the main cause of regression.
- Reported a corrected no-go result for Memento/copy memory.
- Recorded a promising but unfinished CrossWS tokenizer direction.
- Established a stop rule: stop before quantization and TTT if pre-quant BPB is not competitive.