PR #1078

open

Log MPO tensor train baseline at r=16 (1.3193 BPB)

by chinmaypatwardhan-opsView on GitHub
val_bpb
1.3193
Architecture
Transformer
Optimizer
Artifact Size
11,918,924 bytes

Training Techniques

Architecture
MLP
Replaced dense linear feedforward matrices in the MLP block with a low-rank Matrix Product Operator (MPO) tensor-train decomposition.
parameters: {"rank":16}

Novel Contributions

  • Replaced standard dense MLP feedforward matrices with a low-rank MPO tensor-train decomposition
  • Initialized the MPO cores at rank 16
  • Reduced parameter footprint from 124M to 12.6M parameters
  • Achieved 1.3193 val_bpb under high compression constraints