PR #1078
openLog MPO tensor train baseline at r=16 (1.3193 BPB)
by chinmaypatwardhan-opsView on GitHub
val_bpb
1.3193
Architecture
Transformer
Optimizer
—
Artifact Size
11,918,924 bytes
Training Techniques
Architecture
MLP
Replaced dense linear feedforward matrices in the MLP block with a low-rank Matrix Product Operator (MPO) tensor-train decomposition.
parameters: {"rank":16}
Novel Contributions
- Replaced standard dense MLP feedforward matrices with a low-rank MPO tensor-train decomposition
- Initialized the MPO cores at rank 16
- Reduced parameter footprint from 124M to 12.6M parameters
- Achieved 1.3193 val_bpb under high compression constraints