PR #2010
openNon-record: Online Best-Agree Infini-gram Phrase Memory (1.08167651)
by Abhishek8108View on GitHub
val_bpb
1.0817
Architecture
Transformer
Optimizer
—
Artifact Size
15,998,983 bytes
Training Techniques
Quantization
GPTQ
bits: 6
scope: model weights
Compression
lzma
level: null
Evaluation
sliding window eval
parameters: {"stride":64}
Architecture
depth recurrence
Uses the cached #1394 SP8192 base with a depth-recurrence schedule.
parameters: null
other
Online variable-length causal phrase-memory overlay with purity-weighted blending of neural and phrase distributions.
parameters: {"token_order":16,"word_order":4}
Sequence Length
sequence_length
train_length: 2048
eval_length: 2048
Other
other
Online best-agree phrase-memory overlay applied at evaluation time on top of the base model.
parameters: {"chunk_tokens":131072,"batch_seqs":32}
Novel Contributions
- Online variable-length causal infini-gram phrase memory layered as an eval-time overlay on top of the cached #1394 SP8192 base
- Purity-weighted gate blending neural and phrase distributions
- Single-file lzma2+base85 packaging of a previously three-file source stack to fit under the 16 MB cap
- C-backed online n-gram state maintaining token-order and word-order phrase statistics in a single causal pass