PR #299

open

[Non-record] LoRA TTT + HParams (val_bpb=1.16973333)

by MistobaanView on GitHub

val_bpb

1.1697

Architecture

Transformer

Optimizer

AdamW

Artifact Size

—

Training Techniques

Test-Time Training

LoRA TTT

parameters: null

Sequence Length

sequence_length

train_length: 2048

eval_length: null

Other

other

Updated baseline hyperparameters for the 10min/16mb track, including an 8x768 configuration, 262144-token training batch, lower learning rates, lower logit softcap, and beta1=0.70.

parameters: {"num_layers":8,"model_dim":768,"train_batch_tokens":262144,"logit_softcap":10,"tied_embed_lr":0.03,"matrix_lr":0.02,"scalar_lr":0.02,"beta1":0.7}

Novel Contributions

Adds the 2026-03-20_better_baseline_params record for the 10min/16mb track
Keeps the same LoRA TTT evaluation path as 2026-03-17_LoRA_TTT
Updates baseline defaults to an 8x768 configuration with 2048 sequence length and 262144-token training batch
Uses lower learning rates, lower logit softcap, and beta1=0.70
Includes submitted train_gpt.py, run logs, and aggregated submission.json