PR #1909

open

Record val_bpb 1.06996: Independent 3-seed reproduction of PR #1874 + TTT_LORA_RANK=192

by GodlyDonutsView on GitHub
val_bpb
1.0700
Architecture
Transformer
Optimizer
SGD
Artifact Size
~76 MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
SmearGate
Part of the reproduced PR #1874 stack.
parameters: null
Gated Attention
AttnOutGate / gated attention component from the reproduced stack.
parameters: {"width":36}
Test-Time Training
LoRA TTT
parameters: {"rank":192}
score-first TTT
parameters: null
Evaluation
sliding window eval
parameters: null
Compression
lzma
level: null
brotli
level: null
Regularization
weight decay
parameters: null
LR Schedule
warmdown
parameters: null
Optimizer
SGD
weight_decay: null
momentum: null
other_params: null

Novel Contributions

  • Independent end-to-end 3-seed reproduction of PR #1874 on separate hardware
  • Single hyperparameter change raising TTT LoRA rank from 128 to 192
  • Provision of reload-ready quantized artifacts and unedited training logs
  • Byte-budget compliance verification with reported headroom under the 16 MB cap
  • Statistical comparison against current merged SOTA with reported significance