PR #1415
openRecord: SP4096 + 3-Layer Recurrence + GPTQ Embeddings + SDClip + ETLB — val_bpb 1.0913 (3-seed mean)
by bigbagView on GitHub
val_bpb
1.0913
Architecture
Transformer
Optimizer
Muon
Artifact Size
~14.75 MB
Training Techniques
Quantization
GPTQ
bits: 8
scope: embeddings
GPTQ
bits: 6
scope: all
Compression
lzma
level: null
Architecture
depth recurrence
3-layer depth recurrence applied to layers 3, 4, and 5
parameters: {"layers":[3,4,5]}
Evaluation
sliding window eval
parameters: {"stride":64}
Other
other
Eval-time logit bias optimized on context tokens during sliding-window evaluation
parameters: {"method":"ETLB","steps":5,"learning_rate":0.05,"clip":3,"warm_start":true}
Regularization
weight decay
parameters: {"weight_decay":0.095}
LR Schedule
higher LR compensation
parameters: {"matrix_lr":0.022}
Novel Contributions
- SP4096 vocabulary
- GPTQ quantization on embeddings
- SDClip quantization clipping
- 3-layer depth recurrence
- Eval-time logit bias (ETLB)
- QK-Gain 5.0
- LZMA code wrapper for artifact savings