PR #626

open

Record: Full GPTQ + LeakyReLU² + Parallel Muon (3-seed mean 1.1180)

by kshitizz36View on GitHub
val_bpb
1.1180
Architecture
Optimizer
Parallel Muon
Artifact Size
15.93MB

Training Techniques

Quantization
GPTQ
bits: null
scope: null
Architecture
LeakyReLU²
Use of squared LeakyReLU activation function
parameters: null
BigramHash
Bigram hashing component with parameters (3072,80)
parameters: {"hash_dim":3072,"hash_buckets":80}
Optimizer
Parallel Muon
weight_decay: null
momentum: null
other_params: null
Evaluation
stride-based eval
parameters: {"stride":64,"mode":"sliding"}
Test-Time Training
No TTT
parameters: null

Novel Contributions

  • Full GPTQ quantization applied
  • Use of LeakyReLU squared activation function
  • Parallel Muon optimizer technique
  • BigramHash component with parameters (3072,80)
  • Independent 3-seed evaluation with sliding window stride=64
  • No test-time training (TTT) used