PR #346

open

Add local baseline reproduction record

by bjbjbjbjbjbjView on GitHub
val_bpb
1.3529
Architecture
Optimizer
Artifact Size

Training Techniques

Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Local single-GPU baseline reproduction run of the OpenAI Parameter Golf NaiveBaseline
parameters: {"train_shards":1,"grad_accum_steps":8}

Novel Contributions

  • Local single-GPU reproduction of the OpenAI Parameter Golf NaiveBaseline
  • Reported best observed validation bpb of 1.3529 at step 4200
  • Documented that validation improved from 4.1077 to 1.3529 and plateaued after step 4200