PR #730
openFix: move Ternary UNet submission folder from track_10min_16mb to track_non_record_16mb
by janwwwView on GitHub
val_bpb
1.1570
Architecture
Ternary U-Net Transformer
Optimizer
Muon
Artifact Size
15.99 MB
Training Techniques
Quantization
QAT
bits: 8
scope: fp_params / model artifact
ternary
bits: 2
scope: weights
Architecture
U-Net
U-Net encoder/decoder with learned skip weights and residual mixing on top of a Transformer backbone
parameters: {"layers":10,"dim":768,"heads":8,"kv_heads":4}
tied embeddings
Factored tied embedding with a 254-dimensional bottleneck
parameters: {"embed_dim":254,"vocab_size":8192}
RoPE
YaRN positional encoding variant
parameters: {"max_len":2048,"rope_base":5000}
MLP3x
4x relu² MLP expansion with fused gate and up projection
parameters: {"mlp_mult":4}
Optimizer
Muon
weight_decay: 0
momentum: 0.95
other_params: {"backend_steps":3,"momentum_warmup_start":0.85,"momentum_warmup_steps":500}
Evaluation
sliding window eval
parameters: {"stride":16}
Sequence Length
sequence_length
train_length: 1024
eval_length: null
LR Schedule
warmdown
parameters: {"fraction":0.2}
Regularization
weight decay
parameters: {"adam_wd":0.05}
Initialization
ones-init
Learned skip weights initialized to ones
Compression
lzma
level: 9
Other
other
Base-3 packing for model compression
parameters: {"packing":"base-3 + LZMA"}
Novel Contributions
- Ternary U-Net Transformer architecture
- NeoMuon optimization for ternary STE gradient attenuation
- 4x relu² MLP expansion
- Factored tied embedding with 254-dimensional bottleneck
- YaRN positional encoding with 8192 BPE
- FP8 QAT to reduce artifact size
- Base-3 + LZMA compression
- Sliding-window evaluation with stride 16