PR #950

open

2026-03-27_PhaseCoherenceGatedGradients submission

by jzgdevView on GitHub

val_bpb

1.3178

Architecture

Transformer

Optimizer

Muon

Artifact Size

—

Training Techniques

Optimizer

Muon

weight_decay: null

momentum: null

other_params: {"adam_split":true}

Quantization

int8

bits: 8

scope: final model

Compression

zlib

level: null

Other

other

Phase-induced coherence-gated gradient descent that computes a normalized coherence score from paired latent/reference dot products and gates backpropagation with a detached scalar alpha.

parameters: {"enabled":true,"beta":2,"min_gate":0.05,"eps":0.000001,"token_stride":32}

Architecture

Gated Attention

Batch-level phase coherence gating applied to gradients using paired pseudo-complex latents and target-token embeddings.

parameters: null

Novel Contributions

Phase-induced coherence-gated gradient descent (PIC-GD)
Pseudo-complex latent pairing via adjacent channels
Detached coherence-based gradient gate
Tokenizer-agnostic val_bpb evaluation
Int8 plus zlib roundtrip export path