PR #285

open

Add non-record local A100 TTT eval-stride0 submission

by DanishjeetSinghView on GitHub
val_bpb
1.3510
Architecture
Optimizer
Artifact Size
11,876,675 bytes

Training Techniques

Quantization
int8
bits: 8
scope: model weights / submission artifact
Evaluation
stride-based eval
parameters: {"stride":0}
Test-Time Training
LoRA TTT
parameters: null
Compression
zlib
level: null
Other
other
Training capped by wall-clock time on a local 1xA100 run
parameters: {"max_wallclock_seconds":600,"hardware":"1x NVIDIA A100-SXM4-40GB","train_shards":80}

Novel Contributions

  • Non-record local 1xA100 baseline submission focused on the TTT metric
  • Uses standard final evaluation with EVAL_STRIDE=0
  • Includes exact train_gpt.py snapshot and training log for reproducibility
  • Fits within the 16MB artifact limit with int8 + zlib compression