val_bpb
1.3538
Architecture
Transformer
Optimizer
—
Artifact Size
12,783,154 bytes
Training Techniques
Evaluation
sliding window eval
parameters: {"stride":64}
Novel Contributions
- Non-record 1xH100 screening bundle documenting a March 26 experiment matrix
- Comparison of dense baseline, fp16-embedding, and 10-layer mixed-precision families
- Evidence that pre-quant and post-quant quality diverge sharply under heavy compression or capacity reduction
- Identification of a smaller near-baseline artifact candidate (Q1)
- Motivation to prioritize evaluation strategy, compression-aware training, and quantization-friendly schedules