PR #1625
open[Non-record] E2E TTT at 27M scale — negative result (val_bpb 1.1104, SP1024)
by ChideraIbe123View on GitHub
val_bpb
1.1104
Architecture
Transformer
Optimizer
—
Artifact Size
13.85 MB
Training Techniques
Test-Time Training
full TTT
parameters: {"mode":"E2E","scope":"MLP-only in last fraction of blocks","last_frac":null,"learning_rate":0.015,"epochs":2}
Architecture
MLP
TTT parameters filtered to MLPs in the last fraction of blocks for end-to-end test-time training.
parameters: {"blocks":"L5-L10"}
Evaluation
sliding window eval
parameters: null
Compression
lzma
level: null
Novel Contributions
- End-to-End Test-Time Training (E2E TTT) ported onto the merged SOTA stack
- 3-config ablation of TTT hyperparameters at 27M scale
- Negative result showing only about 0.001 BPB total gain across large changes in learning rate, trainable parameters, and epochs
- MLP-only TTT applied to the last fraction of blocks