PR #736

open

Submit 9L 2xMLP optimized parameter run with val_bpb 1.2168

by Git-AaryaView on GitHub
val_bpb
1.2168
Architecture
Transformer
Optimizer
Artifact Size
15.8 MB

Training Techniques

Architecture
MLP3x
Uses a 9-layer model with 2x MLP multiplier as part of the architecture tuning.
parameters: {"layers":9,"mlp_multiplier":2}
Sequence Length
sequence_length
train_length: 2048
eval_length: null
LR Schedule
warmdown
parameters: {"warmdown_steps":3600}

Novel Contributions

  • Maintained a 9-layer, 2x MLP architecture
  • Increased training sequence length to 2048
  • Tuned matrix learning rate to 0.055
  • Extended warmdown iterations to 3600
  • Achieved a reported validation BPB of 1.2168 within a 15.8 MB artifact size