PR #299

open

[Non-record] LoRA TTT + HParams (val_bpb=1.16973333)

by MistobaanView on GitHub
val_bpb
1.1697
Architecture
Transformer
Optimizer
AdamW
Artifact Size

Training Techniques

Test-Time Training
LoRA TTT
parameters: null
Sequence Length
sequence_length
train_length: 2048
eval_length: null
Other
other
Updated baseline hyperparameters for the 10min/16mb track, including an 8x768 configuration, 262144-token training batch, lower learning rates, lower logit softcap, and beta1=0.70.
parameters: {"num_layers":8,"model_dim":768,"train_batch_tokens":262144,"logit_softcap":10,"tied_embed_lr":0.03,"matrix_lr":0.02,"scalar_lr":0.02,"beta1":0.7}

Novel Contributions

  • Adds the 2026-03-20_better_baseline_params record for the 10min/16mb track
  • Keeps the same LoRA TTT evaluation path as 2026-03-17_LoRA_TTT
  • Updates baseline defaults to an 8x768 configuration with 2048 sequence length and 262144-token training batch
  • Uses lower learning rates, lower logit softcap, and beta1=0.70
  • Includes submitted train_gpt.py, run logs, and aggregated submission.json