PR #155
openRecord: sliding eval, FP16 tied embeddings, 10 layers, Muon WD 0.02, overtone init, and phase-transition residual mixing. (val_bpb 1.1876)
by peytontolbertView on GitHub
val_bpb
1.1876
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,842,628 bytes
Training Techniques
Evaluation
sliding window eval
parameters: {"stride":64}
Quantization
fp16
bits: 16
scope: tied embeddings
Architecture
Transformer layers
Uses a 10-layer transformer model.
parameters: {"layers":10}
Optimizer
Muon
weight_decay: 0.02
momentum: null
other_params: null
Initialization
overtone spectral embedding initialization
Spectral embedding initialization with power 0.5.
phase-transition residual-mix initialization
Residual mixing initialization based on phase-transition behavior.
Compression
zlib
level: null
Regularization
weight decay
parameters: {"weight_decay":0.02}
Novel Contributions
- Sliding-window final evaluation with stride 64
- FP16 tied embedding export
- 10 transformer layers
- Muon weight decay 0.02
- Overtone spectral embedding initialization with power 0.5
- Phase-transition residual-mix initialization
- Post-quant int8 zlib roundtrip exact validation