PR #1390

open

Non-record: LeakyReLU + Sliding Window Eval + Zstd compression

by NICOH-YAYView on GitHub
val_bpb
1.2634
Architecture
Transformer
Optimizer
Artifact Size
15.77MB

Training Techniques

Architecture
LeakyReLU
Changed MLP activation from ReLU² to LeakyReLU(0.5)²
parameters: {"negative_slope":0.5}
Evaluation
sliding window eval
parameters: {"stride":256}
Compression
zstd
level: 22

Novel Contributions

  • LeakyReLU(0.5)² activation in the MLP
  • Sliding window evaluation with stride 256
  • Zstd compression at level 22