PR #2121

open

Record candidate: StageB v2 CaseOps TTT seed42 1.06099764

val_bpb

1.0610

Architecture

Transformer

Optimizer

—

Artifact Size

15,995,233 bytes

Training Techniques

Quantization

GPTQ

bits: null

scope: model scales, skip gates, residual mixes, scalar tensors

mixed int4/int8

bits: null

scope: scalar/control tensors

Architecture

SmearGate

Enabled smear gate and sparse attention gate variants in the model stack.

parameters: {"sparse_attn_gate_scale":0.75}

Gated Attention

Attention path uses gated attention with quantized gate control.

parameters: {"gated_attn_quant_gate":1}

Test-Time Training

LoRA TTT

parameters: {"rank":80,"prefix_docs":2500,"beta2":0.99,"weight_decay":0.5,"chunk_size":48,"phased":true,"score_first":true}

Regularization

weight decay

parameters: {"value":0.5}

LR Schedule

warmdown

parameters: {"warmdown_frac":0.82}

Other

other

CaseOps pipeline with Brotli-only self-contained compression path and phased score-first TTT stack.

parameters: null

StageB v2 CaseOps + phased score-first LoRA TTT record candidate
Brotli-only self-contained compression path without lrzip or apt-get
Scalar/control quantization with LQER top-1 selection
Phased score-first LoRA TTT with rank 80 and prefix-doc adaptation
NGRAM_MIX_ALPHA=0 with no byte PPM or validation-time n-gram cache
Official-reference and auxiliary multi-seed confirmations with detailed timing caveats