PR #1145
openRecord: 1.1109 BPB Loader FullGPTQ XSA11 + online ngram augment
by AnirudhRahulView on GitHub
val_bpb
1.1109
Architecture
Transformer
Optimizer
—
Artifact Size
15,953,221 bytes
Training Techniques
Quantization
GPTQ
bits: null
scope: base model
Architecture
XSA
Uses the Loader_FullGPTQ_XSA11 BigramHash2816 architecture from PR #1060 as the base model.
parameters: {"last_n":11,"bigram_vocab_size":2816,"bigram_dim":112}
BigramHash
Bigram-hash architecture used as the base for the submission.
parameters: {"last_n":11,"bigram_vocab_size":2816,"bigram_dim":112}
LR Schedule
warmdown
parameters: {"warmdown_steps":4000}
Evaluation
online n-gram agreement eval
parameters: {"experts":["token n-gram top-token hints","within-word continuation hints","word-start first-token hints"],"single_pass":true,"normalized_boost":true}
Other
other
Causal eval-time ensemble that boosts a prefix-derived hinted token with agreement between multiple online experts.
parameters: {"single_pass":true,"prefix_only":true}
Novel Contributions
- Retunes the training schedule with WARMDOWN_ITERS=4000.
- Adds a single-pass online token/within-word/word-start agreement evaluator.
- Uses a causal prefix-only ensemble with agreement boosting on top of the base model distribution.
- Applies a normalized one-token boost and renormalization rather than target-conditioned cache scoring.
- Packages the online evaluator inside the record folder for eval-time use.