PR #2030

open

Random Projection + Low-Rank Adapter MLPs (Non-Record / Requested PR)Submit Random Projection + LoRA adapter (non-record track)

by Ammar12FalahView on GitHub

val_bpb

1.4135

Architecture

Transformer

Optimizer

Muon

Artifact Size

8.93 MB

Training Techniques

Architecture

RandomLoRALinear

Replaces MLP linear layers with a frozen random Gaussian projection plus a learned low-rank adapter, using W = R(seed) + alpha * BA.

parameters: {"rank":64,"alpha":1}

Initialization

LoRA init

Adapter A is Kaiming-initialized and B is zero-initialized so the model starts as a pure random projection with zero perturbation.

Optimizer

Muon

weight_decay: null

momentum: null

other_params: {"adam_for_embeddings_and_scalars":true}

Quantization

int8

bits: 8

scope: post-training artifact

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

Applies random projection plus low-rank adapters to MLP linear layers in the Parameter Golf GPT baseline.
Uses a frozen Gaussian projection regenerated from an 8-byte seed and a learned LoRA-style adapter.
Provides a rank ablation showing rank=16 vs rank=64 performance under the same wallclock cap.
Demonstrates compatibility with the existing torch.compile, Muon/Adam split, and int8+zlib export pipeline.