PR #1164
openNon-record: random-map adapter train/eval ablations on 8xH100
by papalino456View on GitHub
val_bpb
1.1917
Architecture
Transformer
Optimizer
—
Artifact Size
15,883,798 bytes
Training Techniques
Other
other
Train-time random-map adapters using random_diag on q, v, and lm_head
parameters: {"adapter_kind":"random_diag","sites":["q","v","lm_head"],"rank":8}
Test-Time Training
LoRA TTT
parameters: {"rank":8}
random-map TTT
parameters: {"adapter_kind":"random_diag","rank":8}
Sequence Length
sequence_length
train_length: null
eval_length: 1024
Novel Contributions
- Non-record ablation study of train-time and evaluation-time adapters built on random linear maps
- Comparison of random-map adapters against a LoRA TTT control on 8xH100
- Demonstration that document-aware test-time training improves performance in this setup
- Evaluation of random_diag adapters on q, v, and lm_head
- Reproducible ablation harness with logs, summary TSV, and plotting utility