PR #1984

open

Non-record: single-seed LengthAwareTTT validation package

by unowenmaxwenView on GitHub
val_bpb
1.0602
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15,898,824 bytes

Training Techniques

Test-Time Training
full TTT
parameters: {"seed":42}
Quantization
GPTQ
bits: null
scope: model
QAT
bits: null
scope: model
Architecture
SmearGate
Sparse attention gating with smear gate enabled.
parameters: null
weight tying
BOS-fixed phased TTT stack includes tied embeddings / weight tying implied by the stack naming and tokenizer/model setup.
parameters: null
Gated Attention
Sparse attention gate used in the LQER SparseAttnGate stack.
parameters: {"scale":0.5}
Optimizer
AdamW
weight_decay: 0.5
momentum: null
other_params: {"beta2":0.99,"ttt_beta2":0.99,"muon_backend_steps":5}
LR Schedule
warmdown
parameters: {"warmup_steps":20,"warmdown_frac":0.85}
Regularization
weight decay
parameters: {"value":0.5}
Sequence Length
sequence_length
train_length: null
eval_length: null

Novel Contributions

  • Single-seed validation package for the LQER SparseAttnGate BOS-fixed phased TTT stack
  • Reproducibility checkpoint / record-support artifact rather than a SOTA claim
  • Validated seed-42 run with val_bpb 1.06018067 under the artifact and wallclock limits
  • Public CaseOps Hugging Face export-based reproduction workflow