PR #2157

open

Record candidate: PR #1797 + AWQ-lite top3 + LQER 60k on b180-tlr56 — val_bpb 1.06043 (seed=0)

val_bpb

1.0604

Architecture

Transformer

Optimizer

—

Artifact Size

15,982,182 bytes

Training Techniques

Quantization

GPTQ-lite

bits: null

scope: block weights

mixed int7/int8

bits: 8

scope: embeddings and selected groups

int8

bits: 8

scope: top-3 salient 64-col groups

Test-Time Training

LoRA TTT

parameters: {"rank":56}

Regularization

dropout

parameters: {"drop_M":true}

logit softcap

parameters: null

Other

other

LQER budget reduction to 60k bytes, using cap margin freed by AWQ-lite promotions

parameters: {"budget_bytes":60000}

other

SparseAttnGate lineage contribution

parameters: null