PR #407

open

Add non-record 1xH200 fp16-embed baseline sweep submission

by itu-itis24-buyukhelvacigilm24View on GitHub

val_bpb

1.3208

Architecture

Transformer

Optimizer

—

Artifact Size

14,327,135 bytes

Training Techniques

Quantization

int8

bits: 8

scope: all weights except tok_emb.weight kept in fp16

Architecture

tied embeddings

Input and output embeddings are tied.

parameters: null

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Other

other

Baseline-adjacent mixed-precision export sweep on 1xH200 with fp16 embedding passthrough, compared against clean baseline, sink tokens, and fixed MTP variants.

parameters: {"hardware":"1xH200","wallclock_seconds":600,"artifact_cap_bytes":16000000}

Novel Contributions

Preserved tok_emb.weight in fp16 during int8 export
Reported a controlled 1xH200 baseline-family sweep under a 16MB artifact cap
Compared fp16 embedding passthrough against sink tokens and fixed MTP variants
Provided a reproducible non-record submission package with exact train script, log, and metadata