PR #1558

open

[Architectural Proof-of-Concept] Saliency-Boosted GPTQ & High-Entropy Routing

by Subramanyam6View on GitHub
val_bpb
1.4500
Architecture
Transformer
Optimizer
AdamW
Artifact Size
8.15 MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
XSA
Cross-Sequence Attention layers used in the model stack.
parameters: {"layers":11}
attention modifications
High-Entropy Routing via aggressive QK gain to saturate softmax and force sharp attention distributions.
parameters: {"qk_gain":4}
RMSNorm
Dedicated global SLOT bias inserted immediately before final RMSNorm.
parameters: {"slot_parameters":512}
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: {"v_t_saliency":true}
Other
other
Saliency-Boosted GPTQ using AdamW second-moment buffer (v_t) to bias GPTQ Hessian diagonal toward high-saliency columns.
parameters: {"hessian_diagonal_boost":0.1}
other
Custom Latent-PABU CUDA kernel proposed to handle dynamic outlier clamping during quantization export.
parameters: null
other
Throughput recovery effort to restore Flash Attention 3 and increase training step count under the 600s budget.
parameters: {"step_count_current":700,"step_count_sota":6922}

Novel Contributions

  • Saliency-Boosted GPTQ using AdamW v_t as a saliency signal
  • High-Entropy Routing with QK-Gain 4.0
  • Dedicated SLOT bias before final RMSNorm
  • Latent-PABU CUDA kernel for dynamic outlier clamping
  • Throughput-starvation diagnosis and corrected BPB accounting