PR #812

open

[non-record track] BankLinear: cross-layer shared weight bank with learned + random mixtures

by andrewmouldonView on GitHub
val_bpb
1.2236
Architecture
Transformer
Optimizer
Artifact Size
15734631 bytes

Training Techniques

Architecture
BankLinear
Replaces per-layer linear weights with mixtures over a shared bank of learned basis matrices and fixed random projections, applied to QKV projections.
parameters: {"layers":9,"learned_basis_matrices":3,"fixed_random_projections":512}
larger MLP
Saved parameters from BankLinear are reinvested into a larger MLP expansion.
parameters: {"mlp_multiplier":2.65}
Initialization
depth-aware mixing coefficient initialization
Mixing coefficients are initialized with a depth-aware profile so early, middle, and late layers are biased toward different learned bases with smooth transitions.

Novel Contributions

  • Introduces BankLinear, a compositional weight synthesis method using shared learned and random basis matrices across layers.
  • Uses per-layer mixing coefficients to construct unique weights while reusing a shared parameter bank.
  • Combines a small learned basis with a larger fixed random basis to increase expressivity under a parameter budget.
  • Applies BankLinear to QKV projections and reinvests saved parameters into a larger MLP.
  • Proposes depth-aware initialization of mixing coefficients for stable training.