PR #2047

open

Add LR0.85 prefix2750 legal TTT record

by ZanePeyckeView on GitHub

val_bpb

1.0591

Architecture

Transformer

Optimizer

AdamW

Artifact Size

15,984,508 bytes

Training Techniques

Test-Time Training

score-first TTT

parameters: {"local_lr_mult":0.85,"prefix_docs":2750,"num_phases":3,"mask":"no_qv","q_lora":0,"v_lora":0}

Sequence Length

sequence_length

train_length: null

eval_length: 2560

Quantization

GPTQ-lite

bits: 8

scope: model

Regularization

weight decay

parameters: {"weight_decay":0.5}

LR Schedule

warmdown

parameters: {"warmdown_frac":0.85}

Architecture

SmearGate

Gate mechanism used in the lineage model stack.

parameters: {"window":12}

Gated Attention

Sparse attention gating used in the lineage model stack.

parameters: {"scale":0.5}

weight tying

Tied embeddings / weight tying in the lineage stack.

parameters: null

RoPE

Attention positional encoding used in the lineage stack.

parameters: null

Other

other

AWQ-lite post-training quantization used to fit the artifact budget.

parameters: {"enabled":true,"bits":8,"group_size":64}

other

Asymmetric logit rescaling used in the lineage stack.

parameters: {"enabled":true}

other

QK gain initialization used in the lineage stack.

parameters: {"qk_gain_init":5.25}

Novel Contributions

Final-day legal TTT neighborhood selection with TTT_LOCAL_LR_MULT=0.85
Prefix2750 score-first phased TTT evaluation
Three-seed verification under strict 600s training/evaluation and 16MB artifact limits
Continuation of the PR #1953 / PR #1945 lineage with legal phased TTT