PR #1315
open[non-record track] BankLinear: cross-layer shared weight bank
by andrewmouldonView on GitHub
val_bpb
1.2207
Architecture
Transformer
Optimizer
—
Artifact Size
15.8MB
Training Techniques
Architecture
weight tying
Cross-layer shared weight bank where each layer synthesizes its weights as a mixture over shared basis matrices instead of storing independent per-layer weights.
parameters: {"layers":9,"learned_basis_matrices":3,"fixed_random_projections":512}
attention projections
BankLinear is applied to Q, K, and V projections to share parameters across depth while preserving per-layer specialization.
parameters: null
MLP expansion
Saved parameters from shared projections are reinvested into a larger MLP.
parameters: {"scale_vs_baseline":2.65}
Initialization
depth-aware init
Mixing coefficients are initialized with a depth-aware profile so early, middle, and late layers are biased toward different bases with smooth transitions.
Novel Contributions
- Cross-layer shared weight bank for linear layers
- Per-layer compositional weight synthesis from shared basis matrices
- Channel-wise modulation in addition to global mixing
- Removal of fixed random projections from the earlier version
- Depth-aware initialization of mixing coefficients
- Reinvestment of saved parameters into a larger MLP
- Application of BankLinear to QKV projections