PR #1164

open

Non-record: random-map adapter train/eval ablations on 8xH100

by papalino456View on GitHub
val_bpb
1.1917
Architecture
Transformer
Optimizer
Artifact Size
15,883,798 bytes

Training Techniques

Other
other
Train-time random-map adapters using random_diag on q, v, and lm_head
parameters: {"adapter_kind":"random_diag","sites":["q","v","lm_head"],"rank":8}
Test-Time Training
LoRA TTT
parameters: {"rank":8}
random-map TTT
parameters: {"adapter_kind":"random_diag","rank":8}
Sequence Length
sequence_length
train_length: null
eval_length: 1024

Novel Contributions

  • Non-record ablation study of train-time and evaluation-time adapters built on random linear maps
  • Comparison of random-map adapters against a LoRA TTT control on 8xH100
  • Demonstration that document-aware test-time training improves performance in this setup
  • Evaluation of random_diag adapters on q, v, and lm_head
  • Reproducible ablation harness with logs, summary TSV, and plotting utility