PR #2082

open

Add SP8192 apex over-cap evidence submission

by PrzemyslaV88View on GitHub

val_bpb

1.0668

Architecture

Transformer

Optimizer

—

Artifact Size

16,415,938 bytes

Training Techniques

Architecture

GQA

Grouped-query attention used in the transformer stack.

parameters: null

depth recurrence

Depth recurrence included in the model design.

parameters: null

SmearGate

Smear gate used as part of the attention/activation design.

parameters: null

Quantization

GPTQ

bits: null

scope: mixed

int5

bits: 5

scope: export-only fallback

Weight Averaging

EMA

parameters: null

Test-Time Training

score-first TTT

parameters: {"chunk_size":48,"lora_rank":80,"phases":3}

LR Schedule

warmdown

parameters: {"warmdown_frac":0.85}

Sequence Length

sequence_length

train_length: 8192

eval_length: null

Other

other

Sparse attention gating used during training and evaluation.

parameters: {"gate_window":12}

other

LQER asymmetric correction enabled with low-rank factorization.

parameters: {"rank":4,"factor_bits":4}

Novel Contributions

Non-record evidence submission showing a strong BPB result while remaining over the 16,000,000 byte cap.
SP8192 apex stack run with grouped-query attention, depth recurrence, sparse attention gating, SmearGate, and LQER asymmetric correction.
Documentation of failed lossless packaging rescues that did not reduce the artifact below cap.
Under-cap int5 fallback export demonstrating the architecture can be packaged within the limit, albeit with worse quality.