PR #1883

open

Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)

by robbiebusinessaccView on GitHub
val_bpb
1.0678
Architecture
Transformer
Optimizer
Artifact Size
15,951,074 bytes

Training Techniques

Test-Time Training
score-first TTT
parameters: {"phased":true,"prefix_docs":1500,"num_phases":3}
Architecture
SmearGate
Smear gate used in the neural stack.
parameters: {"window":12}
SparseAttnGate
Sparse attention gating in the model.
parameters: null
LQER
Low-rank quantization/error refinement with asymmetric grouping.
parameters: {"rank":4,"top_k":3,"factor_bits":4,"asym_group":64}
Quantization
GPTQ
bits: 6
scope: model
Compression
brotli
level: null
lzma
level: null
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: null
LR Schedule
warmdown
parameters: null
Regularization
weight decay
parameters: null

Novel Contributions

  • Budget-compliant 3-seed reproduction of PR #1854's neural stack with PHASED_TTT_PREFIX_DOCS reduced from 2000 to 1500.
  • Reported 3-seed mean val_bpb of 1.06777 under the 600s evaluation budget.
  • CaseOps byte accounting via sidecar-based original UTF-8 byte counts.
  • Included but did not claim a multibin-lambda byte-PPM mixer refinement for reproducibility.