PR #497

open

Non-record: FP16 embed + MLP992 sliding-window size-repair probe

val_bpb

1.3162

Architecture

—

Optimizer

—

Artifact Size

14.42MB

Training Techniques

Quantization

int8 with FP16 token embedding

bits: 8

scope: token embedding

Architecture

MLP

Reduced MLP width to 992 as a size-repair offset

parameters: {"MLP_HIDDEN":992}

Evaluation

sliding window eval

parameters: {"stride":64}

Test-Time Training

skipped

parameters: null

Kept tied token embedding in FP16 during final int8+zlib export to recover artifact size
Reduced MLP width to 992 as a size-repair offset
Demonstrated a successful non-record research probe under 16MB on local 8x NVIDIA L20Z hardware
Skipped TTT evaluation deliberately to focus on post-quant sliding-window roundtrip metric
Provided a concrete, reproducible snapshot for future work