PR #1078

open

Log MPO tensor train baseline at r=16 (1.3193 BPB)

by chinmaypatwardhan-opsView on GitHub

val_bpb

1.3193

Architecture

Transformer

Optimizer

—

Artifact Size

11,918,924 bytes

Training Techniques

Architecture

MLP

Replaced dense linear feedforward matrices in the MLP block with a low-rank Matrix Product Operator (MPO) tensor-train decomposition.

parameters: {"rank":16}

Replaced standard dense MLP feedforward matrices with a low-rank MPO tensor-train decomposition
Initialized the MPO cores at rank 16
Reduced parameter footprint from 124M to 12.6M parameters
Achieved 1.3193 val_bpb under high compression constraints