PR #812

open

[non-record track] BankLinear: cross-layer shared weight bank with learned + random mixtures

by andrewmouldonView on GitHub

val_bpb

1.2236

Architecture

Transformer

Optimizer

—

Artifact Size

15734631 bytes

Training Techniques

Architecture

BankLinear

Replaces per-layer linear weights with mixtures over a shared bank of learned basis matrices and fixed random projections, applied to QKV projections.

parameters: {"layers":9,"learned_basis_matrices":3,"fixed_random_projections":512}

larger MLP

Saved parameters from BankLinear are reinvested into a larger MLP expansion.

parameters: {"mlp_multiplier":2.65}

Initialization

depth-aware mixing coefficient initialization

Mixing coefficients are initialized with a depth-aware profile so early, middle, and late layers are biased toward different learned bases with smooth transitions.

Novel Contributions

Introduces BankLinear, a compositional weight synthesis method using shared learned and random basis matrices across layers.
Uses per-layer mixing coefficients to construct unique weights while reusing a shared parameter bank.
Combines a small learned basis with a larger fixed random basis to increase expressivity under a parameter budget.
Applies BankLinear to QKV projections and reinvests saved parameters into a larger MLP.
Proposes depth-aware initialization of mixing coefficients for stable training.