PR #2121

open

Record candidate: StageB v2 CaseOps TTT seed42 1.06099764

by KbediakoView on GitHub
val_bpb
1.0610
Architecture
Transformer
Optimizer
Artifact Size
15,995,233 bytes

Training Techniques

Quantization
GPTQ
bits: null
scope: model scales, skip gates, residual mixes, scalar tensors
mixed int4/int8
bits: null
scope: scalar/control tensors
Architecture
SmearGate
Enabled smear gate and sparse attention gate variants in the model stack.
parameters: {"sparse_attn_gate_scale":0.75}
Gated Attention
Attention path uses gated attention with quantized gate control.
parameters: {"gated_attn_quant_gate":1}
Test-Time Training
LoRA TTT
parameters: {"rank":80,"prefix_docs":2500,"beta2":0.99,"weight_decay":0.5,"chunk_size":48,"phased":true,"score_first":true}
Regularization
weight decay
parameters: {"value":0.5}
LR Schedule
warmdown
parameters: {"warmdown_frac":0.82}
Other
other
CaseOps pipeline with Brotli-only self-contained compression path and phased score-first TTT stack.
parameters: null

Novel Contributions

  • StageB v2 CaseOps + phased score-first LoRA TTT record candidate
  • Brotli-only self-contained compression path without lrzip or apt-get
  • Scalar/control quantization with LQER top-1 selection
  • Phased score-first LoRA TTT with rank 80 and prefix-doc adaptation
  • NGRAM_MIX_ALPHA=0 with no byte PPM or validation-time n-gram cache
  • Official-reference and auxiliary multi-seed confirmations with detailed timing caveats