PR #1137

open

leakyRelu0.5^2 + GPTQ + EMA + BigramHash(1.3069)

by HieuabssyView on GitHub
val_bpb
1.3069
Architecture
Optimizer
Artifact Size

Training Techniques

Architecture
LeakyReLU
Uses a leaky ReLU activation variant with squared output implied by the title.
parameters: {"slope":0.5,"squared":true}
BigramHash
Uses bigram hashing / embedding to incorporate token pair information.
parameters: null
Quantization
GPTQ
bits: null
scope: null
Weight Averaging
EMA
parameters: null

Novel Contributions

  • LeakyReLU with squared activation variant
  • GPTQ quantization
  • EMA weight averaging
  • BigramHash architecture component