PR #776
openRecord Submission: 0.9258 BPB — Kitchen Sink (7-gram + XSA6 + BigramHash4K + Cosine TTT)
by agalimovaView on GitHub
val_bpb
0.9258
Architecture
Transformer
Optimizer
—
Artifact Size
under 16MB
Training Techniques
Architecture
XSA
Extended XSA context window / last-N setting used in the model.
parameters: {"XSA_LAST_N":6}
BigramHash
Bigram hash vocabulary enlarged for the n-gram/bigram component.
parameters: {"BIGRAM_VOCAB_SIZE":4096}
N-gram cache
Increased n-gram order for the cache-based language modeling component.
parameters: {"NGRAM_ORDER":7}
Test-Time Training
Cosine TTT
parameters: {"epochs":20}
Evaluation
stride-based eval
parameters: {"stride":64}
Quantization
int6
bits: 6
scope: all
Other
other
Hyperparameter search using autoresearch-multi combinatorial search with interaction detection.
parameters: {"modes":["EXPLORE","EXPLOIT","COMBINE","NARROW"]}
Novel Contributions
- Record submission achieving 0.9258 val_bpb
- Kitchen sink combination of 7-gram cache, XSA6, BigramHash4K, and Cosine TTT
- Hyperparameter improvements discovered via autoresearch-multi combinatorial search
- Use of superadditive combination of techniques
- Evaluation completed within the 10-minute budget