PR #1393

open

[Submission] SwiGLU MLP (under 16MB)

by Abhinav-AvasaralaView on GitHub

val_bpb

1.4716

Architecture

Transformer

Optimizer

—

Artifact Size

~13.6MB

Training Techniques

Architecture

SwiGLU

Replaced the baseline ReLU² MLP with a SwiGLU-based MLP.

parameters: {"mlp_mult":1}

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null

Regularization

gradient clipping

parameters: null

Sequence Length

sequence_length

train_length: null

eval_length: null