PR #1866

open

Non-record: Trinity Ternary CPU v3 — Apple M1 Pro 72h, val_bpb 1.5042

by deborahnelson8788726View on GitHub
val_bpb
1.5042
Architecture
Transformer
Optimizer
AdamW
Artifact Size
5.53 MB

Training Techniques

Quantization
STE QAT
bits: 2
scope: mostly weights
GPTQ
bits: 6
scope: model weights
Architecture
weight tying
Tied input embeddings and output embeddings.
parameters: {"vocab_size":1024}
ReLU²
Uses squared ReLU activation in the MLP.
parameters: null
RoPE
Applies rotary positional embeddings across the full head dimension.
parameters: null
RMSNorm
Uses RMS normalization in the transformer blocks.
parameters: null
MLP3x
Transformer MLP with widened hidden layer.
parameters: {"multiplier":2.5}
LR Schedule
cosine decay
parameters: {"start_lr":0.0003,"end_lr":0.00003}
linear warmup
parameters: {"start_step":0,"end_step":500}
Weight Averaging
EMA + SWA
parameters: null
Regularization
magnitude pruning
parameters: {"type":"selective ±1 pruning"}
Compression
lzma
level: 9

Novel Contributions

  • First Apple Silicon CPU-only Parameter Golf submission
  • Trinity base-3 packing for ternary weights
  • Step-based ternary ramp that survives Mac sleep
  • Cosine LR decay synchronized with ternary blend
  • Fully reproducible CPU-only pipeline on a 16GB laptop