PR #1909

open

Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192

by GodlyDonutsView on GitHub

val_bpb

1.0700

Architecture

Transformer

Optimizer

SGD

Artifact Size

~76 MB

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

Architecture

SmearGate

Part of the reproduced PR #1874 stack.

parameters: null

Gated Attention

AttnOutGate / gated attention component from the reproduced stack.

parameters: {"width":36}

Test-Time Training

LoRA TTT

parameters: {"rank":192}

score-first TTT

parameters: null

Evaluation

sliding window eval

parameters: null

Compression

lzma

level: null

brotli

level: null

Regularization

weight decay

parameters: null

LR Schedule

warmdown

parameters: null

Optimizer

SGD

weight_decay: null

momentum: null

other_params: null

Novel Contributions

Independent end-to-end 3-seed reproduction of PR #1874 on separate hardware
Single hyperparameter change raising TTT LoRA rank from 128 to 192
Provision of reload-ready quantized artifacts and unedited training logs
Byte-budget compliance verification with reported headroom under the 16 MB cap
Statistical comparison against current merged SOTA with reported significance