PR #2030
openRandom Projection + Low-Rank Adapter MLPs (Non-Record / Requested PR)Submit Random Projection + LoRA adapter (non-record track)
by Ammar12FalahView on GitHub
val_bpb
1.4135
Architecture
Transformer
Optimizer
Muon
Artifact Size
8.93 MB
Training Techniques
Architecture
RandomLoRALinear
Replaces MLP linear layers with a frozen random Gaussian projection plus a learned low-rank adapter, using W = R(seed) + alpha * BA.
parameters: {"rank":64,"alpha":1}
Initialization
LoRA init
Adapter A is Kaiming-initialized and B is zero-initialized so the model starts as a pure random projection with zero perturbation.
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"adam_for_embeddings_and_scalars":true}
Quantization
int8
bits: 8
scope: post-training artifact
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Novel Contributions
- Applies random projection plus low-rank adapters to MLP linear layers in the Parameter Golf GPT baseline.
- Uses a frozen Gaussian projection regenerated from an 8-byte seed and a learned LoRA-style adapter.
- Provides a rank ablation showing rank=16 vs rank=64 performance under the same wallclock cap.
- Demonstrates compatibility with the existing torch.compile, Muon/Adam split, and int8+zlib export pipeline.