PR #776

open

Record Submission: 0.9258 BPB — Kitchen Sink (7-gram + XSA6 + BigramHash4K + Cosine TTT)

by agalimovaView on GitHub
val_bpb
0.9258
Architecture
Transformer
Optimizer
Artifact Size
under 16MB

Training Techniques

Architecture
XSA
Extended XSA context window / last-N setting used in the model.
parameters: {"XSA_LAST_N":6}
BigramHash
Bigram hash vocabulary enlarged for the n-gram/bigram component.
parameters: {"BIGRAM_VOCAB_SIZE":4096}
N-gram cache
Increased n-gram order for the cache-based language modeling component.
parameters: {"NGRAM_ORDER":7}
Test-Time Training
Cosine TTT
parameters: {"epochs":20}
Evaluation
stride-based eval
parameters: {"stride":64}
Quantization
int6
bits: 6
scope: all
Other
other
Hyperparameter search using autoresearch-multi combinatorial search with interaction detection.
parameters: {"modes":["EXPLORE","EXPLOIT","COMBINE","NARROW"]}

Novel Contributions

  • Record submission achieving 0.9258 val_bpb
  • Kitchen sink combination of 7-gram cache, XSA6, BigramHash4K, and Cosine TTT
  • Hyperparameter improvements discovered via autoresearch-multi combinatorial search
  • Use of superadditive combination of techniques
  • Evaluation completed within the 10-minute budget