PR #769

open

PROTEUS+STYX — val_bpb 0.8508 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache

by MatoTeziTankaView on GitHub

val_bpb

0.8508

Architecture

Transformer

Optimizer

—

Artifact Size

<16MB

Training Techniques

Quantization

int6

bits: 6

scope: all

Architecture

LeakyReLU(0.9)²

Replaced the standard activation with F.leaky_relu(x, 0.9).square().

parameters: {"slope":0.9}

tied embeddings

Uses tied input/output embeddings.

parameters: null

GQA

Grouped-query attention with 4 KV heads out of 8 total heads.

parameters: {"heads":8,"kv_heads":4}

Evaluation

sliding window eval

parameters: {"stride":64,"seq_len":2048}

stride-based eval

parameters: {"stride":2048,"seq_len":2048}

Other

other

Backward-looking 5-gram evaluation cache with fixed-alpha blending of model and cache probabilities.

parameters: {"ngram":5,"buckets":4194304,"alpha_model":0.8,"alpha_cache":0.2}

other

Verified cache effectiveness at zero overlap to rule out overlap artifacts.

parameters: {"stride":2048,"overlap":0}

Compression

zstd

level: null

LeakyReLU(0.9)² activation replacing the standard activation
Backward-looking 5-gram evaluation cache built from already-scored tokens
Fixed-alpha blending between model and cache probabilities
Zero-overlap verification showing the cache improvement is not just an overlap artifact
INT6 quantized model with zstd-compressed artifact