PR #2078
openSupport: independent PR2014 prefix-2400 reproduction, seed 42 val_bpb 1.05804
by hi-aduekView on GitHub
val_bpb
1.0580
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15,989,499 bytes
Training Techniques
Quantization
int6
bits: 6
scope: model + code artifact
Test-Time Training
LoRA TTT
parameters: {"rank":80,"learning_rate":0.0001,"local_lr_mult":0.75,"mask":"no_qv","num_phases":1,"prefix_docs":2400}
Evaluation
stride-based eval
parameters: {"stride":1536}
Sequence Length
sequence_length
train_length: 3072
eval_length: 3072
Regularization
weight decay
parameters: {"value":0.5}
Optimizer
AdamW
weight_decay: 0.5
momentum: null
other_params: {"beta2":0.99}
Architecture
SmearGate
Uses SmearGate in the attention stack.
parameters: {"enabled":true,"window":12}
Gated Attention
Uses gated attention with quantized gate.
parameters: {"quant_gate":1,"scale":0.5}
weight tying
Uses tied embeddings / weight tying.
parameters: null
Other
other
Uses CaseOps SP8192 training shards and a phased TTT prefix budget reproduction of PR #2014.
parameters: {"caseops_enabled":1,"vocab_size":8192,"phased_ttt_prefix_docs":2400,"phased_ttt_num_phases":1}
Compression
custom
level: null
Novel Contributions
- Independent seed-42 reproduction/support package for PR #2014
- Uses a reduced phased-TTT prefix budget of 2400 docs to stay under the 600s eval cap
- Reports full validation coverage with val_tokens equal to target_tokens
- Provides a compliant support run for the PR #2014 frontier line rather than a new three-seed architecture claim