PR #938

open

[non-record] 1xH100 screening: compression + eval strategy

by numb3r33View on GitHub
val_bpb
1.3538
Architecture
Transformer
Optimizer
Artifact Size
12,783,154 bytes

Training Techniques

Evaluation
sliding window eval
parameters: {"stride":64}

Novel Contributions

  • Non-record 1xH100 screening bundle documenting a March 26 experiment matrix
  • Comparison of dense baseline, fp16-embedding, and 10-layer mixed-precision families
  • Evidence that pre-quant and post-quant quality diverge sharply under heavy compression or capacity reduction
  • Identification of a smaller near-baseline artifact candidate (Q1)
  • Motivation to prioritize evaluation strategy, compression-aware training, and quantization-friendly schedules