PR #1145

open

Record: 1.1109 BPB Loader FullGPTQ XSA11 + online ngram augment

by AnirudhRahulView on GitHub

val_bpb

1.1109

Architecture

Transformer

Optimizer

—

Artifact Size

15,953,221 bytes

Training Techniques

Quantization

GPTQ

bits: null

scope: base model

Architecture

XSA

Uses the Loader_FullGPTQ_XSA11 BigramHash2816 architecture from PR #1060 as the base model.

parameters: {"last_n":11,"bigram_vocab_size":2816,"bigram_dim":112}

BigramHash

Bigram-hash architecture used as the base for the submission.

parameters: {"last_n":11,"bigram_vocab_size":2816,"bigram_dim":112}

LR Schedule

warmdown

parameters: {"warmdown_steps":4000}

Evaluation

online n-gram agreement eval

parameters: {"experts":["token n-gram top-token hints","within-word continuation hints","word-start first-token hints"],"single_pass":true,"normalized_boost":true}

Other

other

Causal eval-time ensemble that boosts a prefix-derived hinted token with agreement between multiple online experts.

parameters: {"single_pass":true,"prefix_only":true}

Novel Contributions

Retunes the training schedule with WARMDOWN_ITERS=4000.
Adds a single-pass online token/within-word/word-start agreement evaluator.
Uses a causal prefix-only ensemble with agreement boosting on top of the base model distribution.
Applies a normalized one-token boost and renormalization rather than target-conditioned cache scoring.
Packages the online evaluator inside the record folder for eval-time use.