val_bpb
1.0461
Architecture
11L/512d U-Net
Optimizer
—
Artifact Size
15.64 MB
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
QAT
bits: 6
scope: all
Architecture
XSA
Uses XSA in the last 4 layers.
parameters: {"last_n":4}
BigramHash
BigramHash vocabulary/feature augmentation.
parameters: {"vocab_size":1536}
RoPE
Uses RoPE dimensions setting.
parameters: {"dims":24}
Other
other
LeakyReLU squared activation with slope 0.5.
parameters: {"activation":"leaky_relu_sq","slope":0.5}
Compression
zstd
level: null
Evaluation
sliding window eval
parameters: {"order":5,"alpha":0.2,"min_count":2,"buckets":4194304,"interpolation":"hashed 5-gram score-first backward-looking mixing"}
Novel Contributions
- Legal score-first hashed 5-gram interpolation during sliding window evaluation
- Fixed-weight linear mixing with alpha=0.20 and no target-aware gating
- Cache built only from already-scored tokens for strictly backward-looking evaluation
- Combination of XSA, BigramHash, GPTQ int6, and late QAT in an 11-layer U-Net