PR #2118

open

Record: 1.0435 Gated XSA + token-only n-gram tilt + LQER top-1 + AWQ-lite + AsymLogit) with GPTQ_RESERVE_SECONDS=2.0 and corrected CaseOps data preparation

by aquariouseworkmanView on GitHub
val_bpb
1.0435
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Architecture
XSA
Uses gated XSA in the model architecture.
parameters: null
BigramHash
Applies token-only n-gram tilt, implying n-gram-based token interaction features.
parameters: null
Quantization
GPTQ
bits: null
scope: null
GPTQ-lite
bits: null
scope: null
bitsandbytes
bits: null
scope: null
Regularization
logit softcap
parameters: null
Other
other
LQER top-1 selection.
parameters: null
other
AsymLogit output transformation.
parameters: null
other
Corrected CaseOps data preparation.
parameters: null

Novel Contributions

  • Gated XSA
  • token-only n-gram tilt
  • LQER top-1
  • AWQ-lite
  • AsymLogit
  • GPTQ_RESERVE_SECONDS=2.0
  • corrected CaseOps data preparation