PR #517
closedRecord*: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine *leaky* TTT)
by lukacfView on GitHub
val_bpb
0.9789
Architecture
Transformer
Optimizer
AdamW
Artifact Size
15.51 MB
Training Techniques
Quantization
int6
bits: 6
scope: all
Compression
zstd
level: null
Architecture
SmearGate
Custom gating component in the baseline architecture.
parameters: null
BigramHash
BigramHash module used in the baseline architecture.
parameters: {"dimensions":2048}
RoPE
Partial rotary positional embeddings applied to a subset of dimensions.
parameters: {"dimensions":"16/64"}
Weight Averaging
EMA
parameters: {"decay":0.997}
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
full TTT
parameters: {"epochs":100,"learning_rate":0.001,"lr_min":0.00001,"scheduler":"cosine annealing"}
LR Schedule
cosine decay
parameters: {"t_max":100,"eta_min":0.00001}
Other
other
Autonomous AI-driven research workflow with experiment provenance tracking and iterative hypothesis testing.
parameters: {"experiments":7,"wall_clock_hours":2}
Novel Contributions
- Applied CosineAnnealingLR to TTT to prevent position-specific overfitting and enable longer TTT runs.
- Achieved 100-epoch test-time training with cosine decay, improving val_bpb to 0.9789.
- Used an autonomous AI research workflow to run hypothesis, implementation, experimentation, and analysis without human intervention on training code.
- Documented experiment lineage and dead-end explorations with provenance tracking.
- Demonstrated that cosine-scheduled TTT scales better than constant learning rate TTT.