val_bpb
1.0602
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15,898,824 bytes
Training Techniques
Test-Time Training
full TTT
parameters: {"seed":42}
Quantization
GPTQ
bits: null
scope: model
QAT
bits: null
scope: model
Architecture
SmearGate
Sparse attention gating with smear gate enabled.
parameters: null
weight tying
BOS-fixed phased TTT stack includes tied embeddings / weight tying implied by the stack naming and tokenizer/model setup.
parameters: null
Gated Attention
Sparse attention gate used in the LQER SparseAttnGate stack.
parameters: {"scale":0.5}
Optimizer
AdamW
weight_decay: 0.5
momentum: null
other_params: {"beta2":0.99,"ttt_beta2":0.99,"muon_backend_steps":5}
LR Schedule
warmdown
parameters: {"warmup_steps":20,"warmdown_frac":0.85}
Regularization
weight decay
parameters: {"value":0.5}
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Single-seed validation package for the LQER SparseAttnGate BOS-fixed phased TTT stack
- Reproducibility checkpoint / record-support artifact rather than a SOTA claim
- Validated seed-42 run with val_bpb 1.06018067 under the artifact and wallclock limits
- Public CaseOps Hugging Face export-based reproduction workflow