PR #991
openRecord: 33.6M Int5 GPTQ + Score-First TTT (val_bpb=1.1145, 3-seed)
by ibarrajoView on GitHub
val_bpb
1.1145
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15.9MB
Training Techniques
Quantization
GPTQ
bits: 5
scope: all
Compression
zstd
level: 22
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.0001,"epochs":3,"blocks_unfrozen":2}
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: {"learning_rate":0.0001}
LR Schedule
cosine decay
parameters: null
Evaluation
sliding window eval
parameters: {"stride":64}
Architecture
U-Net skip connections
Model uses U-Net style skip connections.
parameters: {"layers":11}
SmearGate
Uses SmearGate feature.
parameters: null
BigramHash
Uses BigramHash features.
parameters: {"size":8192}
XSA
Uses XSA across all layers.
parameters: {"layers":11}
Partial RoPE
Applies RoPE to a subset of dimensions.
parameters: {"dimensions":16}
LN Scale
Includes layer norm scale.
parameters: null
Regularization
layerwise LN scale
parameters: null
Novel Contributions
- 33.6M parameter larger model with d=576 and 3.5x MLP
- Int5 GPTQ quantization with clipping range [-16, 15]
- Legal score-first backward-looking TTT
- Post-TTT temperature calibration at T=0.98
- 3-seed validation showing improved mean val_bpb