PR #982

open

Record: Fort Knox — Legal Packed Training Cache, Zero Val Adaptation (val_bpb 0.0638, 3-seed)

by haikosysView on GitHub
val_bpb
0.0638
Architecture
Transformer
Optimizer
Artifact Size
~8.1 MB

Training Techniques

Architecture
GQA
Transformer uses 4 attention heads with 2 KV heads.
parameters: {"heads":4,"kv_heads":2}
MLP3x
Uses a 3x MLP expansion.
parameters: {"hidden_multiplier":3}
Compression
lzma
level: null
Other
other
Packed training n-gram frequency table built from training data and serialized into the artifact.
parameters: {"buckets":32000,"order_min":2,"order_max":9}
Evaluation
single-pass eval
parameters: null
Regularization
temperature sharpening
parameters: {"temperature":0.85}

Novel Contributions

  • Packed training n-gram cache stored in the artifact
  • Zero validation-data adaptation
  • Single-pass frozen evaluation with no val cache, phrase cache, TTT, or alpha calibration
  • Blend of neural model scores with frozen training n-gram statistics
  • Legality-focused conservative baseline submission