PR #1390
openNon-record: LeakyReLU + Sliding Window Eval + Zstd compression
by NICOH-YAYView on GitHub
val_bpb
1.2634
Architecture
Transformer
Optimizer
—
Artifact Size
15.77MB
Training Techniques
Architecture
LeakyReLU
Changed MLP activation from ReLU² to LeakyReLU(0.5)²
parameters: {"negative_slope":0.5}
Evaluation
sliding window eval
parameters: {"stride":256}
Compression
zstd
level: 22
Novel Contributions
- LeakyReLU(0.5)² activation in the MLP
- Sliding window evaluation with stride 256
- Zstd compression at level 22