PR #448

open

Add SmearGate+BigramHash context-repair submission (1.2006 BPB, 15.0MB)

by handemanaiView on GitHub
val_bpb
1.2006
Architecture
Dense 9-layer 512-dim GQA Transformer
Optimizer
Artifact Size
14,999,691 bytes

Training Techniques

Architecture
SmearGate
Learned gating over shifted input features to repair carried-context failures under sliding-window evaluation.
parameters: null
BigramHash
Hashed bigram embeddings providing explicit local context features.
parameters: {"bigram_vocab_size":4096,"bigram_dim":128}
MLP3x
Increased MLP capacity using MLP_MULT=3.
parameters: {"mlp_mult":3}
KV head count
Grouped-query attention with fewer KV heads than attention heads.
parameters: {"num_heads":8,"num_kv_heads":4}
Quantization
int6
bits: 6
scope: all weights with fp16 embedding passthrough
Compression
zstd
level: null
Evaluation
sliding window eval
parameters: {"stride":512}
manual logits-only exact evaluation
parameters: null
Other
other
Context-feature repair targeting carried-context failures under sliding-window evaluation.
parameters: null

Novel Contributions

  • SmearGate context feature for shifted-input repair
  • BigramHash explicit local context embeddings
  • Context-repair approach that improves carried-context sliding-window behavior
  • int6+zstd export with fp16 embedding passthrough
  • Corrected evaluation path using forward_logits plus manual cross-entropy