PR #2056
openNon-record 10min/16MB: BRLM SAGE Chunk4 causal gate (1xH100, val_bpb 1.26461)
by FF-GardenFnView on GitHub
val_bpb
1.2646
Architecture
Hybrid
Optimizer
—
Artifact Size
13,181,861 B
Training Techniques
Architecture
depth recurrence
Bifurcated recurrent LM with chunk-level recurrent/broadcast side-channel gating (SAGE-bus) over local hidden states.
parameters: {"chunk":4,"dimensions":384}
other
Causal within-chunk routing gate using cummax so each position only depends on route signals from earlier positions in the same chunk.
parameters: {"gate_type":"cummax-based within-chunk gate"}
Quantization
int4
bits: 4
scope: all
Compression
zlib
level: null
Novel Contributions
- BRLM SAGE chunk4 architecture with deterministic chunk-level side-channel routing
- Fixing the within-chunk gate to be causal via cummax-based gating
- Legalized version of a previously non-causal chunk4 SAGE prototype
- Achieved 1.26460519 val_bpb under the 16MB artifact cap