PR #497

open

Non-record: FP16 embed + MLP992 sliding-window size-repair probe

by THUQiXuanView on GitHub
val_bpb
1.3162
Architecture
Optimizer
Artifact Size
14.42MB

Training Techniques

Quantization
int8 with FP16 token embedding
bits: 8
scope: token embedding
Architecture
MLP
Reduced MLP width to 992 as a size-repair offset
parameters: {"MLP_HIDDEN":992}
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
skipped
parameters: null

Novel Contributions

  • Kept tied token embedding in FP16 during final int8+zlib export to recover artifact size
  • Reduced MLP width to 992 as a size-repair offset
  • Demonstrated a successful non-record research probe under 16MB on local 8x NVIDIA L20Z hardware
  • Skipped TTT evaluation deliberately to focus on post-quant sliding-window roundtrip metric
  • Provided a concrete, reproducible snapshot for future work