PR #497
openNon-record: FP16 embed + MLP992 sliding-window size-repair probe
by THUQiXuanView on GitHub
val_bpb
1.3162
Architecture
—
Optimizer
—
Artifact Size
14.42MB
Training Techniques
Quantization
int8 with FP16 token embedding
bits: 8
scope: token embedding
Architecture
MLP
Reduced MLP width to 992 as a size-repair offset
parameters: {"MLP_HIDDEN":992}
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
skipped
parameters: null
Novel Contributions
- Kept tied token embedding in FP16 during final int8+zlib export to recover artifact size
- Reduced MLP width to 992 as a size-repair offset
- Demonstrated a successful non-record research probe under 16MB on local 8x NVIDIA L20Z hardware
- Skipped TTT evaluation deliberately to focus on post-quant sliding-window roundtrip metric
- Provided a concrete, reproducible snapshot for future work