val_bpb
1.3029
Architecture
Transformer
Optimizer
—
Artifact Size
11,851,989 bytes
Training Techniques
Sequence Length
sequence_length
train_length: 2048
eval_length: null
LR Schedule
warmdown
parameters: {"warmdown_steps":2200}
Architecture
weight tying
Tied embeddings were used.
parameters: null
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Novel Contributions
- Non-record budget-constrained baseline on a single H100
- Seq2048 training run
- Warmdown learning-rate tuning
- Int8 quantized submission compressed with zlib
- Reproducible under a 600s wallclock cap