PR #950

open

2026-03-27_PhaseCoherenceGatedGradients submission

val_bpb
1.3178
Architecture
Transformer
Optimizer
Muon
Artifact Size

Training Techniques

Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"adam_split":true}
Quantization
int8
bits: 8
scope: final model
Compression
zlib
level: null
Other
other
Phase-induced coherence-gated gradient descent that computes a normalized coherence score from paired latent/reference dot products and gates backpropagation with a detached scalar alpha.
parameters: {"enabled":true,"beta":2,"min_gate":0.05,"eps":0.000001,"token_stride":32}
Architecture
Gated Attention
Batch-level phase coherence gating applied to gradients using paired pseudo-complex latents and target-token embeddings.
parameters: null

Novel Contributions

  • Phase-induced coherence-gated gradient descent (PIC-GD)
  • Pseudo-complex latent pairing via adjacent channels
  • Detached coherence-based gradient gate
  • Tokenizer-agnostic val_bpb evaluation
  • Int8 plus zlib roundtrip export path