PR #1133

open

Non-record: Sliding-Window Evaluation + Int8-Zlib Compression (1.34 bpb)

by swetapaul08View on GitHub
val_bpb
1.3440
Architecture
Transformer
Optimizer
Artifact Size
14727167 bytes

Training Techniques

Evaluation
sliding window eval
parameters: {"stride":512}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024

Novel Contributions

  • sliding-window evaluation with EVAL_STRIDE=512
  • int8 quantization
  • zlib-compressed artifact
  • single GPU run