PR #982
openRecord: Fort Knox — Legal Packed Training Cache, Zero Val Adaptation (val_bpb 0.0638, 3-seed)
by haikosysView on GitHub
val_bpb
0.0638
Architecture
Transformer
Optimizer
—
Artifact Size
~8.1 MB
Training Techniques
Architecture
GQA
Transformer uses 4 attention heads with 2 KV heads.
parameters: {"heads":4,"kv_heads":2}
MLP3x
Uses a 3x MLP expansion.
parameters: {"hidden_multiplier":3}
Compression
lzma
level: null
Other
other
Packed training n-gram frequency table built from training data and serialized into the artifact.
parameters: {"buckets":32000,"order_min":2,"order_max":9}
Evaluation
single-pass eval
parameters: null
Regularization
temperature sharpening
parameters: {"temperature":0.85}
Novel Contributions
- Packed training n-gram cache stored in the artifact
- Zero validation-data adaptation
- Single-pass frozen evaluation with no val cache, phrase cache, TTT, or alpha calibration
- Blend of neural model scores with frozen training n-gram statistics
- Legality-focused conservative baseline submission