PR #1261

closed

Add non-record 1xH100 budget seq2048 run (val_bpb 1.3029)

val_bpb

1.3029

Architecture

Transformer

Optimizer

—

Artifact Size

11,851,989 bytes

Training Techniques

Sequence Length

sequence_length

train_length: 2048

eval_length: null

LR Schedule

warmdown

parameters: {"warmdown_steps":2200}

Architecture

weight tying

Tied embeddings were used.

parameters: null

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null