PR #1261

closed

Add non-record 1xH100 budget seq2048 run (val_bpb 1.3029)

by Aniket-pdView on GitHub
val_bpb
1.3029
Architecture
Transformer
Optimizer
Artifact Size
11,851,989 bytes

Training Techniques

Sequence Length
sequence_length
train_length: 2048
eval_length: null
LR Schedule
warmdown
parameters: {"warmdown_steps":2200}
Architecture
weight tying
Tied embeddings were used.
parameters: null
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null

Novel Contributions

  • Non-record budget-constrained baseline on a single H100
  • Seq2048 training run
  • Warmdown learning-rate tuning
  • Int8 quantized submission compressed with zlib
  • Reproducible under a 600s wallclock cap