val_bpb
1.2947
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Architecture
LeakyReLU
Replaced ReLU² with LeakyReLU(0.5)² in the MLP forward pass to preserve negative gradient flow while keeping squared outputs.
parameters: {"negative_slope":0.5}
Novel Contributions
- Replaced ReLU² with LeakyReLU(0.5)² in the MLP forward pass
- Preserved negative gradient flow while maintaining squared output characteristic
- Reported improved validation bpb over the ReLU² baseline