PR #1883

open

Record: PR #1854 neural stack — budget-compliant 1.06777 (3-seed mean)

by robbiebusinessaccView on GitHub

val_bpb

1.0678

Architecture

Transformer

Optimizer

—

Artifact Size

15,951,074 bytes

Training Techniques

Test-Time Training

score-first TTT

parameters: {"phased":true,"prefix_docs":1500,"num_phases":3}

Architecture

SmearGate

Smear gate used in the neural stack.

parameters: {"window":12}

SparseAttnGate

Sparse attention gating in the model.

parameters: null

LQER

Low-rank quantization/error refinement with asymmetric grouping.

parameters: {"rank":4,"top_k":3,"factor_bits":4,"asym_group":64}

Quantization

GPTQ

bits: 6

scope: model

Compression

brotli

level: null

lzma

level: null

Evaluation

sliding window eval

parameters: null

Sequence Length

sequence_length

train_length: null

eval_length: null

LR Schedule

warmdown

parameters: null

Regularization

weight decay

parameters: null

Novel Contributions

Budget-compliant 3-seed reproduction of PR #1854's neural stack with PHASED_TTT_PREFIX_DOCS reduced from 2000 to 1500.
Reported 3-seed mean val_bpb of 1.06777 under the 600s evaluation budget.
CaseOps byte accounting via sidecar-based original UTF-8 byte counts.
Included but did not claim a multibin-lambda byte-PPM mixer refinement for reproducibility.