PR #1622

open

Record: GDN-Hybrid + Sliding Window Attention (cold-cache, 1.01710 BPB)

by joshkmartinezView on GitHub

val_bpb

1.0171

Architecture

Hybrid

Optimizer

AdamW

Artifact Size

15,981,262 bytes

Training Techniques

Architecture

GDN

GDN-hybrid backbone with repeated GDN blocks and sliding-window attention side path

parameters: {"layers":5}

sliding window eval

Sliding-window attention side path present in the model

parameters: null

Weight Averaging

EMA

parameters: {"decay":0.997}

Quantization

late QAT

bits: 6

scope: all

GPTQ

bits: 6

scope: all

Compression

zstd

level: 22

Optimizer

Muon

weight_decay: null

momentum: null

other_params: {"variant":"MuonEq-R"}

AdamW

weight_decay: null

momentum: null

other_params: null

Restages the strongest still-authoritative clean SAFE_SUBMISSION artifact after a later audit invalidated a stale score
Uses pulled TensorPool artifacts as authority for the reported metrics
Combines a GDN-hybrid backbone with sliding-window attention in a cold-cache 3-seed confirmation run
Confirms a SAFE_SUBMISSION fixed-predictor no-TTT Track-A lane under the 16 MB cap