PR #812
open[non-record track] BankLinear: cross-layer shared weight bank with learned + random mixtures
by andrewmouldonView on GitHub
val_bpb
1.2236
Architecture
Transformer
Optimizer
—
Artifact Size
15734631 bytes
Training Techniques
Architecture
BankLinear
Replaces per-layer linear weights with mixtures over a shared bank of learned basis matrices and fixed random projections, applied to QKV projections.
parameters: {"layers":9,"learned_basis_matrices":3,"fixed_random_projections":512}
larger MLP
Saved parameters from BankLinear are reinvested into a larger MLP expansion.
parameters: {"mlp_multiplier":2.65}
Initialization
depth-aware mixing coefficient initialization
Mixing coefficients are initialized with a depth-aware profile so early, middle, and late layers are biased toward different learned bases with smooth transitions.
Novel Contributions
- Introduces BankLinear, a compositional weight synthesis method using shared learned and random basis matrices across layers.
- Uses per-layer mixing coefficients to construct unique weights while reusing a shared parameter bank.
- Combines a small learned basis with a larger fixed random basis to increase expressivity under a parameter budget.
- Applies BankLinear to QKV projections and reinvests saved parameters into a larger MLP.
- Proposes depth-aware initialization of mixing coefficients for stable training.