PR #1883
openRecord: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)
by robbiebusinessaccView on GitHub
val_bpb
1.0678
Architecture
Transformer
Optimizer
—
Artifact Size
15,951,074 bytes
Training Techniques
Test-Time Training
score-first TTT
parameters: {"phased":true,"prefix_docs":1500,"num_phases":3}
Architecture
SmearGate
Smear gate used in the neural stack.
parameters: {"window":12}
SparseAttnGate
Sparse attention gating in the model.
parameters: null
LQER
Low-rank quantization/error refinement with asymmetric grouping.
parameters: {"rank":4,"top_k":3,"factor_bits":4,"asym_group":64}
Quantization
GPTQ
bits: 6
scope: model
Compression
brotli
level: null
lzma
level: null
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: null
eval_length: null
LR Schedule
warmdown
parameters: null
Regularization
weight decay
parameters: null
Novel Contributions
- Budget-compliant 3-seed reproduction of PR #1854's neural stack with PHASED_TTT_PREFIX_DOCS reduced from 2000 to 1500.
- Reported 3-seed mean val_bpb of 1.06777 under the 600s evaluation budget.
- CaseOps byte accounting via sidecar-based original UTF-8 byte counts.
- Included but did not claim a multibin-lambda byte-PPM mixer refinement for reproducibility.