PR #1164

open

Non-record: random-map adapter train/eval ablations on 8xH100

val_bpb

1.1917

Architecture

Transformer

Optimizer

—

Artifact Size

15,883,798 bytes

Training Techniques

Other

other

Train-time random-map adapters using random_diag on q, v, and lm_head

parameters: {"adapter_kind":"random_diag","sites":["q","v","lm_head"],"rank":8}

Test-Time Training

LoRA TTT

parameters: {"rank":8}

random-map TTT

parameters: {"adapter_kind":"random_diag","rank":8}

Sequence Length

sequence_length

train_length: null

eval_length: 1024

Non-record ablation study of train-time and evaluation-time adapters built on random linear maps
Comparison of random-map adapters against a LoRA TTT control on 8xH100
Demonstration that document-aware test-time training improves performance in this setup
Evaluation of random_diag adapters on q, v, and lm_head
Reproducible ablation harness with logs, summary TSV, and plotting utility