PR #1622
openRecord: GDN-Hybrid + Sliding Window Attention (cold-cache, 1.01710 BPB)
by joshkmartinezView on GitHub
val_bpb
1.0171
Architecture
Hybrid
Optimizer
AdamW
Artifact Size
15,981,262 bytes
Training Techniques
Architecture
GDN
GDN-hybrid backbone with repeated GDN blocks and sliding-window attention side path
parameters: {"layers":5}
sliding window eval
Sliding-window attention side path present in the model
parameters: null
Weight Averaging
EMA
parameters: {"decay":0.997}
Quantization
late QAT
bits: 6
scope: all
GPTQ
bits: 6
scope: all
Compression
zstd
level: 22
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"variant":"MuonEq-R"}
AdamW
weight_decay: null
momentum: null
other_params: null
Novel Contributions
- Restages the strongest still-authoritative clean SAFE_SUBMISSION artifact after a later audit invalidated a stale score
- Uses pulled TensorPool artifacts as authority for the reported metrics
- Combines a GDN-hybrid backbone with sliding-window attention in a cold-cache 3-seed confirmation run
- Confirms a SAFE_SUBMISSION fixed-predictor no-TTT Track-A lane under the 16 MB cap