PR #2056

open

Non-record 10min/16MB: BRLM SAGE Chunk4 causal gate (1xH100, val_bpb 1.26461)

by FF-GardenFnView on GitHub
val_bpb
1.2646
Architecture
Hybrid
Optimizer
Artifact Size
13,181,861 B

Training Techniques

Architecture
depth recurrence
Bifurcated recurrent LM with chunk-level recurrent/broadcast side-channel gating (SAGE-bus) over local hidden states.
parameters: {"chunk":4,"dimensions":384}
other
Causal within-chunk routing gate using cummax so each position only depends on route signals from earlier positions in the same chunk.
parameters: {"gate_type":"cummax-based within-chunk gate"}
Quantization
int4
bits: 4
scope: all
Compression
zlib
level: null

Novel Contributions

  • BRLM SAGE chunk4 architecture with deterministic chunk-level side-channel routing
  • Fixing the within-chunk gate to be causal via cummax-based gating
  • Legalized version of a previously non-causal chunk4 SAGE prototype
  • Achieved 1.26460519 val_bpb under the 16MB artifact cap