PR #769

open

PROTEUS+STYX — val_bpb 0.8508 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache

by MatoTeziTankaView on GitHub
val_bpb
0.8508
Architecture
Transformer
Optimizer
Artifact Size
<16MB

Training Techniques

Quantization
int6
bits: 6
scope: all
Architecture
LeakyReLU(0.9)²
Replaced the standard activation with F.leaky_relu(x, 0.9).square().
parameters: {"slope":0.9}
tied embeddings
Uses tied input/output embeddings.
parameters: null
GQA
Grouped-query attention with 4 KV heads out of 8 total heads.
parameters: {"heads":8,"kv_heads":4}
Evaluation
sliding window eval
parameters: {"stride":64,"seq_len":2048}
stride-based eval
parameters: {"stride":2048,"seq_len":2048}
Other
other
Backward-looking 5-gram evaluation cache with fixed-alpha blending of model and cache probabilities.
parameters: {"ngram":5,"buckets":4194304,"alpha_model":0.8,"alpha_cache":0.2}
other
Verified cache effectiveness at zero overlap to rule out overlap artifacts.
parameters: {"stride":2048,"overlap":0}
Compression
zstd
level: null

Novel Contributions

  • LeakyReLU(0.9)² activation replacing the standard activation
  • Backward-looking 5-gram evaluation cache built from already-scored tokens
  • Fixed-alpha blending between model and cache probabilities
  • Zero-overlap verification showing the cache improvement is not just an overlap artifact
  • INT6 quantized model with zstd-compressed artifact