PR #1022

open

Non-Record: McGilchrist Register Token — causal cumulative mean + FiLM global context pathway

by aramdovView on GitHub
val_bpb
1.1646
Architecture
Transformer
Optimizer
Muon
Artifact Size
16,855,808 bytes

Training Techniques

Architecture
FiLM
Adds a causal cumulative-mean global context pathway in each transformer block, projecting to FiLM gamma/beta parameters to modulate block outputs.
parameters: {"register_scale":0.01,"bottleneck_dim":8,"model_dim":512}
weight tying
Untied output embedding head is mentioned as the cause of artifact overage; the base SOTA uses tied embeddings.
parameters: null
Quantization
int8
bits: 8
scope: model weights
Compression
zstd
level: 22
Optimizer
Muon
weight_decay: 0.04
momentum: null
other_params: null

Novel Contributions

  • Causal cumulative-mean register pathway for global context in each transformer block
  • FiLM-based modulation of block outputs using gamma/beta from a bottleneck projection
  • Zero-initialized register mechanism designed to start identically to the baseline
  • Per-step mechanism confirmed to learn without destabilizing training