PR #687

closed

Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745)

by RoyiRaView on GitHub

val_bpb

1.0745

Architecture

—

Optimizer

—

Artifact Size

<15.5 MB

Training Techniques

Quantization

GPTQ

bits: null

scope: model weights

Weight Averaging

EMA

parameters: null

Test-Time Training

TTT

parameters: {"learning_rate":0.0001,"chunk_tokens":131072,"use_mixer":true}

Other

other

5-expert logistic context mixer using Hedge algorithm to blend neural, unigram, bigram, trigram, and entropy experts in log-probability space during TTT evaluation

parameters: {"experts":["neural","unigram","bigram","trigram","entropy"],"online_update":"log_w -= eta * loss"}

other

Incremental n-gram table construction from already-scored tokens only

parameters: {"ngram_order":[1,2,3],"trigram_buckets":65536}

Novel Contributions

5-expert Hedge-based logistic context mixer
Online blending of neural and n-gram experts in log-probability space during TTT evaluation
Incremental n-gram statistics built only from already-scored tokens
GPTQ calibration performed within the training budget
Three-seed mean record validation score