PR #2118
openRecord: 1.0435 Gated XSA + token-only n-gram tilt + LQER top-1 + AWQ-lite + AsymLogit) with GPTQ_RESERVE_SECONDS=2.0 and corrected CaseOps data preparation
by aquariouseworkmanView on GitHub
val_bpb
1.0435
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Architecture
XSA
Uses gated XSA in the model architecture.
parameters: null
BigramHash
Applies token-only n-gram tilt, implying n-gram-based token interaction features.
parameters: null
Quantization
GPTQ
bits: null
scope: null
GPTQ-lite
bits: null
scope: null
bitsandbytes
bits: null
scope: null
Regularization
logit softcap
parameters: null
Other
other
LQER top-1 selection.
parameters: null
other
AsymLogit output transformation.
parameters: null
other
Corrected CaseOps data preparation.
parameters: null
Novel Contributions
- Gated XSA
- token-only n-gram tilt
- LQER top-1
- AWQ-lite
- AsymLogit
- GPTQ_RESERVE_SECONDS=2.0
- corrected CaseOps data preparation