PR #2082

open

Add SP8192 apex over-cap evidence submission

by PrzemyslaV88View on GitHub
val_bpb
1.0668
Architecture
Transformer
Optimizer
Artifact Size
16,415,938 bytes

Training Techniques

Architecture
GQA
Grouped-query attention used in the transformer stack.
parameters: null
depth recurrence
Depth recurrence included in the model design.
parameters: null
SmearGate
Smear gate used as part of the attention/activation design.
parameters: null
Quantization
GPTQ
bits: null
scope: mixed
int5
bits: 5
scope: export-only fallback
Weight Averaging
EMA
parameters: null
Test-Time Training
score-first TTT
parameters: {"chunk_size":48,"lora_rank":80,"phases":3}
LR Schedule
warmdown
parameters: {"warmdown_frac":0.85}
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Other
other
Sparse attention gating used during training and evaluation.
parameters: {"gate_window":12}
other
LQER asymmetric correction enabled with low-rank factorization.
parameters: {"rank":4,"factor_bits":4}

Novel Contributions

  • Non-record evidence submission showing a strong BPB result while remaining over the 16,000,000 byte cap.
  • SP8192 apex stack run with grouped-query attention, depth recurrence, sparse attention gating, SmearGate, and LQER asymmetric correction.
  • Documentation of failed lossless packaging rescues that did not reduce the artifact below cap.
  • Under-cap int5 fallback export demonstrating the architecture can be packaged within the limit, albeit with worse quality.