PR #1684

open

Non-record: wip Random-Basis MLPs + LoRA

by camden-gitView on GitHub
val_bpb
1.2554
Architecture
Transformer
Optimizer
Muon
Artifact Size
~8-10MB

Training Techniques

Architecture
ReLU²
Uses a ReLU-squared activated random-basis MLP as a nonlinear feature map.
parameters: null
LoRA
Adds low-rank adaptation layers on top of the random-basis MLP features.
parameters: {"rank":16}
SmearGate
Per-hidden diagonal gate applied to hidden features to amplify or suppress them.
parameters: {"dimensions":8192}
random basis MLP
Replaces learned MLP weights with deterministic random features generated from a seed.
parameters: {"mlp_mult":16,"layers":11,"dim":512}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: null
Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: null
scope: MLP B and gates
Compression
brotli
level: null

Novel Contributions

  • Stores random-basis MLP weights implicitly via a seed instead of explicit parameters.
  • Combines random feature maps with LoRA to recover task-specific capacity.
  • Uses a per-hidden diagonal gate to modulate random features.
  • Expands hidden width to 4x the baseline because the random basis is seed-generated and not stored in the artifact.