PR #2010

open

Non-record: Online Best-Agree Infini-gram Phrase Memory (1.08167651)

by Abhishek8108View on GitHub

val_bpb

1.0817

Architecture

Transformer

Optimizer

—

Artifact Size

15,998,983 bytes

Training Techniques

Quantization

GPTQ

bits: 6

scope: model weights

Compression

lzma

level: null

Evaluation

sliding window eval

parameters: {"stride":64}

Architecture

depth recurrence

Uses the cached #1394 SP8192 base with a depth-recurrence schedule.

parameters: null

other

Online variable-length causal phrase-memory overlay with purity-weighted blending of neural and phrase distributions.

parameters: {"token_order":16,"word_order":4}

Sequence Length

sequence_length

train_length: 2048

eval_length: 2048

Other

other

Online best-agree phrase-memory overlay applied at evaluation time on top of the base model.

parameters: {"chunk_tokens":131072,"batch_seqs":32}

Online variable-length causal infini-gram phrase memory layered as an eval-time overlay on top of the cached #1394 SP8192 base
Purity-weighted gate blending neural and phrase distributions
Single-file lzma2+base85 packaging of a previously three-file source stack to fit under the 16 MB cap
C-backed online n-gram state maintaining token-order and word-order phrase statistics in a single causal pass