val_bpb
1.3069
Architecture
—
Optimizer
—
Artifact Size
—
Training Techniques
Architecture
LeakyReLU
Uses a leaky ReLU activation variant with squared output implied by the title.
parameters: {"slope":0.5,"squared":true}
BigramHash
Uses bigram hashing / embedding to incorporate token pair information.
parameters: null
Quantization
GPTQ
bits: null
scope: null
Weight Averaging
EMA
parameters: null
Novel Contributions
- LeakyReLU with squared activation variant
- GPTQ quantization
- EMA weight averaging
- BigramHash architecture component