PR #974
openNon-record: Random Linear Map Adapter Projections — 1.21MB artifact (val_bpb=1.6542)
by anthony-maioView on GitHub
val_bpb
1.6542
Architecture
Transformer
Optimizer
—
Artifact Size
1.21MB
Training Techniques
Architecture
Partial RoPE
Uses partial rotary positional embeddings as part of the base stack.
parameters: null
BigramHash
Uses bigram hash embeddings/features in the base model.
parameters: null
SmearGate
Uses SmearGate in the base model.
parameters: null
XSA
Uses XSA attention modification in the base stack.
parameters: null
VE128
Uses VE128 in the base model.
parameters: null
LeakyReLU
Uses LeakyReLU(0.5)^2 MLP activation.
parameters: {"squared":true,"negative_slope":0.5}
Weight Averaging
EMA
parameters: null
SWA
parameters: null
Quantization
GPTQ-lite
bits: 6
scope: block weights
late QAT
bits: null
scope: block weights
Compression
lzma
level: null
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: 512
eval_length: 1024
Other
other
Replaces selected dense projection families with seeded frozen random linear maps plus learned low-rank adapter cores and diagonal scales.
parameters: {"replaced_components":["blocks.*.attn.proj","blocks.*.mlp.proj"],"adapter_rank":32}
other
Uses seed-regenerated frozen random projections so the base matrices cost no serialized artifact bytes.
parameters: {"seed_bytes":4}
Novel Contributions
- Seed-regenerated frozen random linear maps replace selected dense projections.
- Small learned LoRA-style adapters are trained on top of random bases.
- Only adapters and scales are serialized, reducing artifact size to 1.21MB.
- Demonstrates stable training with random projection adapters on a strong AR baseline.
- Applies the requested 'learning adapters on random linear maps' direction.