PR #407

open

Add non-record 1xH200 fp16-embed baseline sweep submission

by itu-itis24-buyukhelvacigilm24View on GitHub
val_bpb
1.3208
Architecture
Transformer
Optimizer
Artifact Size
14,327,135 bytes

Training Techniques

Quantization
int8
bits: 8
scope: all weights except tok_emb.weight kept in fp16
Architecture
tied embeddings
Input and output embeddings are tied.
parameters: null
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Baseline-adjacent mixed-precision export sweep on 1xH200 with fp16 embedding passthrough, compared against clean baseline, sink tokens, and fixed MTP variants.
parameters: {"hardware":"1xH200","wallclock_seconds":600,"artifact_cap_bytes":16000000}

Novel Contributions

  • Preserved tok_emb.weight in fp16 during int8 export
  • Reported a controlled 1xH200 baseline-family sweep under a 16MB artifact cap
  • Compared fp16 embedding passthrough against sink tokens and fixed MTP variants
  • Provided a reproducible non-record submission package with exact train script, log, and metadata