PR #1315

open

[non-record track] BankLinear: cross-layer shared weight bank

by andrewmouldonView on GitHub

val_bpb

1.2207

Architecture

Transformer

Optimizer

—

Artifact Size

15.8MB

Training Techniques

Architecture

weight tying

Cross-layer shared weight bank where each layer synthesizes its weights as a mixture over shared basis matrices instead of storing independent per-layer weights.

parameters: {"layers":9,"learned_basis_matrices":3,"fixed_random_projections":512}

attention projections

BankLinear is applied to Q, K, and V projections to share parameters across depth while preserving per-layer specialization.

parameters: null

MLP expansion

Saved parameters from shared projections are reinvested into a larger MLP.

parameters: {"scale_vs_baseline":2.65}

Initialization

depth-aware init

Mixing coefficients are initialized with a depth-aware profile so early, middle, and late layers are biased toward different bases with smooth transitions.

Novel Contributions

Cross-layer shared weight bank for linear layers
Per-layer compositional weight synthesis from shared basis matrices
Channel-wise modulation in addition to global mixing
Removal of fixed random projections from the earlier version
Depth-aware initialization of mixing coefficients
Reinvestment of saved parameters into a larger MLP
Application of BankLinear to QKV projections