PR #949

closed

Submit 2026-03-27_PhaseCoherenceGatedGradients

val_bpb
1.3178
Architecture
Transformer
Optimizer
Muon
Artifact Size

Training Techniques

Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"optimizer_split":"Muon + Adam"}
Quantization
int8
bits: 8
scope: final artifact
Compression
zlib
level: null
Other
other
Phase-induced coherence-gated gradient descent (PIC-GD) using paired real/imag latent channels and a detached coherence-based gradient gate
parameters: {"beta":2,"min_gate":0.05,"eps":0.000001,"token_stride":32}
Evaluation
tokenizer-agnostic val_bpb evaluation
parameters: null

Novel Contributions

  • Phase-induced coherence-gated gradient descent (PIC-GD)
  • Paired adjacent hidden channels as pseudo-complex latents
  • Detached coherence-based gradient gate applied to training loss
  • Tokenizer-agnostic val_bpb evaluation
  • Int8 + zlib roundtrip export path