PR #1716

open

Record: SP8192 + BigramHash d=32 + Path A v3 passthrough quantization — val_bpb 1.07882 (3-seed mean)

by himanshudongreView on GitHub
val_bpb
1.0788
Architecture
Transformer
Optimizer
Muon
Artifact Size
15.99 MB

Training Techniques

Architecture
BigramHash
BigramHashEmbedding with reduced embedding dimension for regularization and size savings.
parameters: {"buckets":16384,"dimensions":32}
depth recurrence
Encoder/decoder layer recurrence loops over selected layers during training.
parameters: {"encoder":[0,1,2,3,4,5,3,4],"decoder":[5,3,4,5,6,7,8,9,10]}
U-Net skip connections
Skip connections with gated residual-style pathways.
parameters: null
Partial RoPE
Rotary position embeddings applied to a subset of dimensions.
parameters: {"dimensions":"16/64"}
SmearGate
SmearGate module used in the architecture.
parameters: {"width":12}
LeakyReLU
LeakyReLU squared activation.
parameters: {"slope":0.5}
Quantization
mixed int6/int8
bits: 6
scope: matrices, embeddings, control tensors, small 2-D matrices
int8
bits: 8
scope: control tensors and small 2-D matrices
Compression
lzma
level: null
Optimizer
Muon
weight_decay: 0.085
momentum: null
other_params: {"variant":"MuonEq-R","newton_schulz_steps":5}
AdamW
weight_decay: 0.095
momentum: null
other_params: {"used_for":"embeddings/scalars"}
Weight Averaging
EMA
parameters: {"decay":0.9965}
Evaluation
sliding window eval
parameters: null
Test-Time Training
score-first TTT
parameters: {"epochs":3,"learning_rate":0.005,"momentum":0.9}
Regularization
logit softcap
parameters: {"value":30}
LR Schedule
warmdown
parameters: {"warmdown_fraction":0.72}
Sequence Length
sequence_length
train_length: 32768
eval_length: 32768

Novel Contributions

  • BigramHashEmbedding with reduced dimension d=32
  • Path A v3 aggressive passthrough quantization for control tensors and small 2-D matrices
  • LZMA self-extracting wrapper to fit the submission under the 16 MB limit
  • Legal score-first test-time training with sliding-window evaluation
  • Record 3-seed mean val_bpb of 1.07882