PR #1984

open

Non-record: single-seed LengthAwareTTT validation package

by unowenmaxwenView on GitHub

val_bpb

1.0602

Architecture

Transformer

Optimizer

AdamW

Artifact Size

15,898,824 bytes

Training Techniques

Test-Time Training

full TTT

parameters: {"seed":42}

Quantization

GPTQ

bits: null

scope: model

QAT

bits: null

scope: model

Architecture

SmearGate

Sparse attention gating with smear gate enabled.

parameters: null

weight tying

BOS-fixed phased TTT stack includes tied embeddings / weight tying implied by the stack naming and tokenizer/model setup.

parameters: null

Gated Attention

Sparse attention gate used in the LQER SparseAttnGate stack.

parameters: {"scale":0.5}

Optimizer

AdamW

weight_decay: 0.5

momentum: null

other_params: {"beta2":0.99,"ttt_beta2":0.99,"muon_backend_steps":5}

LR Schedule

warmdown

parameters: {"warmup_steps":20,"warmdown_frac":0.85}

Regularization

weight decay

parameters: {"value":0.5}

Sequence Length

sequence_length

train_length: null

eval_length: null

Novel Contributions

Single-seed validation package for the LQER SparseAttnGate BOS-fixed phased TTT stack
Reproducibility checkpoint / record-support artifact rather than a SOTA claim
Validated seed-42 run with val_bpb 1.06018067 under the artifact and wallclock limits
Public CaseOps Hugging Face export-based reproduction workflow