PR #1558
open[Architectural Proof-of-Concept] Saliency-Boosted GPTQ & High-Entropy Routing
by Subramanyam6View on GitHub
val_bpb
1.4500
Architecture
Transformer
Optimizer
AdamW
Artifact Size
8.15 MB
Training Techniques
Quantization
GPTQ
bits: 6
scope: all
Architecture
XSA
Cross-Sequence Attention layers used in the model stack.
parameters: {"layers":11}
attention modifications
High-Entropy Routing via aggressive QK gain to saturate softmax and force sharp attention distributions.
parameters: {"qk_gain":4}
RMSNorm
Dedicated global SLOT bias inserted immediately before final RMSNorm.
parameters: {"slot_parameters":512}
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: {"v_t_saliency":true}
Other
other
Saliency-Boosted GPTQ using AdamW second-moment buffer (v_t) to bias GPTQ Hessian diagonal toward high-saliency columns.
parameters: {"hessian_diagonal_boost":0.1}
other
Custom Latent-PABU CUDA kernel proposed to handle dynamic outlier clamping during quantization export.
parameters: null
other
Throughput recovery effort to restore Flash Attention 3 and increase training step count under the 600s budget.
parameters: {"step_count_current":700,"step_count_sota":6922}
Novel Contributions
- Saliency-Boosted GPTQ using AdamW v_t as a saliency signal
- High-Entropy Routing with QK-Gain 4.0
- Dedicated SLOT bias before final RMSNorm
- Latent-PABU CUDA kernel for dynamic outlier clamping
- Throughput-starvation diagnosis and corrected BPB accounting