PR #2041

open

Record: SP8192 V21 + Inside-timer N-gram TTT (no Gated XSA) — val_bpb 1.05692 (3-seed mean)

by jorge-asenjoView on GitHub

val_bpb

1.0569

Architecture

Transformer

Optimizer

—

Artifact Size

15,977,032 bytes

Training Techniques

Test-Time Training

LoRA TTT

parameters: {"rank":80,"learning_rate":0.75,"num_phases":1,"prefix_docs":1000,"score_before_update":true}

Evaluation

inside-timer n-gram precompute

parameters: {"precompute_outside":0}

Architecture

LeakyReLU

LeakyReLU squared activation with slope 0.3

parameters: {"slope":0.3}

weight tying

CaseOps SP8192 tokenizer/base stack includes tied embedding-style tokenizer setup as part of the V21 lineage

parameters: null

Sequence Length

sequence_length

train_length: null

eval_length: 2560

LR Schedule

warmdown

parameters: {"warmdown_frac":0.85}

Quantization

GPTQ-lite

bits: null

scope: artifact/model

Other

other

N-gram tilt with closed-form normalized distribution over the SP8192 vocabulary

parameters: {"enabled":1}

other

AsymLogit rescale

parameters: null

other

AWQ-lite mixed-precision stack

parameters: null