PR #1133
openNon-record: Sliding-Window Evaluation + Int8-Zlib Compression (1.34 bpb)
by swetapaul08View on GitHub
val_bpb
1.3440
Architecture
Transformer
Optimizer
—
Artifact Size
14727167 bytes
Training Techniques
Evaluation
sliding window eval
parameters: {"stride":512}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Novel Contributions
- sliding-window evaluation with EVAL_STRIDE=512
- int8 quantization
- zlib-compressed artifact
- single GPU run