PR #1558

open

[Architectural Proof-of-Concept] Saliency-Boosted GPTQ & High-Entropy Routing

by Subramanyam6View on GitHub

val_bpb

1.4500

Architecture

Transformer

Optimizer

AdamW

Artifact Size

8.15 MB

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

Architecture

XSA

Cross-Sequence Attention layers used in the model stack.

parameters: {"layers":11}

attention modifications

High-Entropy Routing via aggressive QK gain to saturate softmax and force sharp attention distributions.

parameters: {"qk_gain":4}

RMSNorm

Dedicated global SLOT bias inserted immediately before final RMSNorm.

parameters: {"slot_parameters":512}

Optimizer

AdamW

weight_decay: null

momentum: null

other_params: {"v_t_saliency":true}

Other

other

Saliency-Boosted GPTQ using AdamW second-moment buffer (v_t) to bias GPTQ Hessian diagonal toward high-saliency columns.

parameters: {"hessian_diagonal_boost":0.1}

other

Custom Latent-PABU CUDA kernel proposed to handle dynamic outlier clamping during quantization export.

parameters: null

other

Throughput recovery effort to restore Flash Attention 3 and increase training step count under the 600s budget.

parameters: {"step_count_current":700,"step_count_sota":6922}