PR #974

open

Non-record: Random Linear Map Adapter Projections — 1.21MB artifact (val_bpb=1.6542)

by anthony-maioView on GitHub

val_bpb

1.6542

Architecture

Transformer

Optimizer

—

Artifact Size

1.21MB

Training Techniques

Architecture

Partial RoPE

Uses partial rotary positional embeddings as part of the base stack.

parameters: null

BigramHash

Uses bigram hash embeddings/features in the base model.

parameters: null

SmearGate

Uses SmearGate in the base model.

parameters: null

XSA

Uses XSA attention modification in the base stack.

parameters: null

VE128

Uses VE128 in the base model.

parameters: null

LeakyReLU

Uses LeakyReLU(0.5)^2 MLP activation.

parameters: {"squared":true,"negative_slope":0.5}

Weight Averaging

EMA

parameters: null

SWA

parameters: null

Quantization

GPTQ-lite

bits: 6

scope: block weights

late QAT

bits: null

scope: block weights

Compression

lzma

level: null

Evaluation

sliding window eval

parameters: null

Sequence Length

sequence_length

train_length: 512

eval_length: 1024

Other

other

Replaces selected dense projection families with seeded frozen random linear maps plus learned low-rank adapter cores and diagonal scales.

parameters: {"replaced_components":["blocks.*.attn.proj","blocks.*.mlp.proj"],"adapter_rank":32}

other

Uses seed-regenerated frozen random projections so the base matrices cost no serialized artifact bytes.

parameters: {"seed_bytes":4}

Novel Contributions

Seed-regenerated frozen random linear maps replace selected dense projections.
Small learned LoRA-style adapters are trained on top of random bases.
Only adapters and scales are serialized, reducing artifact size to 1.21MB.
Demonstrates stable training with random projection adapters on a strong AR baseline.
Applies the requested 'learning adapters on random linear maps' direction.