val_bpb
1.0745
Architecture
—
Optimizer
—
Artifact Size
<15.5 MB
Training Techniques
Quantization
GPTQ
bits: null
scope: model weights
Weight Averaging
EMA
parameters: null
Test-Time Training
TTT
parameters: {"learning_rate":0.0001,"chunk_tokens":131072,"use_mixer":true}
Other
other
5-expert logistic context mixer using Hedge algorithm to blend neural, unigram, bigram, trigram, and entropy experts in log-probability space during TTT evaluation
parameters: {"experts":["neural","unigram","bigram","trigram","entropy"],"online_update":"log_w -= eta * loss"}
other
Incremental n-gram table construction from already-scored tokens only
parameters: {"ngram_order":[1,2,3],"trigram_buckets":65536}
Novel Contributions
- 5-expert Hedge-based logistic context mixer
- Online blending of neural and n-gram experts in log-probability space during TTT evaluation
- Incremental n-gram statistics built only from already-scored tokens
- GPTQ calibration performed within the training budget
- Three-seed mean record validation score