PR #1315

open

[non-record track] BankLinear: cross-layer shared weight bank

by andrewmouldonView on GitHub
val_bpb
1.2207
Architecture
Transformer
Optimizer
Artifact Size
15.8MB

Training Techniques

Architecture
weight tying
Cross-layer shared weight bank where each layer synthesizes its weights as a mixture over shared basis matrices instead of storing independent per-layer weights.
parameters: {"layers":9,"learned_basis_matrices":3,"fixed_random_projections":512}
attention projections
BankLinear is applied to Q, K, and V projections to share parameters across depth while preserving per-layer specialization.
parameters: null
MLP expansion
Saved parameters from shared projections are reinvested into a larger MLP.
parameters: {"scale_vs_baseline":2.65}
Initialization
depth-aware init
Mixing coefficients are initialized with a depth-aware profile so early, middle, and late layers are biased toward different bases with smooth transitions.

Novel Contributions

  • Cross-layer shared weight bank for linear layers
  • Per-layer compositional weight synthesis from shared basis matrices
  • Channel-wise modulation in addition to global mixing
  • Removal of fixed random projections from the earlier version
  • Depth-aware initialization of mixing coefficients
  • Reinvestment of saved parameters into a larger MLP
  • Application of BankLinear to QKV projections