PR #2078

open

Support: independent PR2014 prefix-2400 reproduction, seed 42 val_bpb 1.05804

by hi-aduekView on GitHub

val_bpb

1.0580

Architecture

Transformer

Optimizer

AdamW

Artifact Size

15,989,499 bytes

Training Techniques

Quantization

int6

bits: 6

scope: model + code artifact

Test-Time Training

LoRA TTT

parameters: {"rank":80,"learning_rate":0.0001,"local_lr_mult":0.75,"mask":"no_qv","num_phases":1,"prefix_docs":2400}

Evaluation

stride-based eval

parameters: {"stride":1536}

Sequence Length

sequence_length

train_length: 3072

eval_length: 3072

Regularization

weight decay

parameters: {"value":0.5}

Optimizer

AdamW

weight_decay: 0.5

momentum: null

other_params: {"beta2":0.99}

Architecture

SmearGate

Uses SmearGate in the attention stack.

parameters: {"enabled":true,"window":12}

Gated Attention

Uses gated attention with quantized gate.

parameters: {"quant_gate":1,"scale":0.5}

weight tying

Uses tied embeddings / weight tying.

parameters: null

Other

other

Uses CaseOps SP8192 training shards and a phased TTT prefix budget reproduction of PR #2014.

parameters: {"caseops_enabled":1,"vocab_size":8192,"phased_ttt_prefix_docs":2400,"phased_ttt_num_phases":1}

Compression

custom

level: null

Novel Contributions

Independent seed-42 reproduction/support package for PR #2014
Uses a reduced phased-TTT prefix budget of 2400 docs to stay under the 600s eval cap
Reports full validation coverage with val_tokens equal to target_tokens
Provides a compliant support run for the PR #2014 frontier line rather than a new three-seed architecture claim