PR #626
openRecord: Full GPTQ + LeakyReLU² + Parallel Muon (3-seed mean 1.1180)
by kshitizz36View on GitHub
val_bpb
1.1180
Architecture
—
Optimizer
Parallel Muon
Artifact Size
15.93MB
Training Techniques
Quantization
GPTQ
bits: null
scope: null
Architecture
LeakyReLU²
Use of squared LeakyReLU activation function
parameters: null
BigramHash
Bigram hashing component with parameters (3072,80)
parameters: {"hash_dim":3072,"hash_buckets":80}
Optimizer
Parallel Muon
weight_decay: null
momentum: null
other_params: null
Evaluation
stride-based eval
parameters: {"stride":64,"mode":"sliding"}
Test-Time Training
No TTT
parameters: null
Novel Contributions
- Full GPTQ quantization applied
- Use of LeakyReLU squared activation function
- Parallel Muon optimizer technique
- BigramHash component with parameters (3072,80)
- Independent 3-seed evaluation with sliding window stride=64
- No test-time training (TTT) used