PR #1131

open

Improve baseline with LeakyReLU² activation

by JianYan11View on GitHub
val_bpb
1.2947
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Architecture
LeakyReLU
Replaced ReLU² with LeakyReLU(0.5)² in the MLP forward pass to preserve negative gradient flow while keeping squared outputs.
parameters: {"negative_slope":0.5}

Novel Contributions

  • Replaced ReLU² with LeakyReLU(0.5)² in the MLP forward pass
  • Preserved negative gradient flow while maintaining squared output characteristic
  • Reported improved validation bpb over the ReLU² baseline