PR #2138
openRecord: SP8192 + Sliding-Window Eval + Lock-In Byte Mixer - val_bpb 0.979556
by anmarhindiView on GitHub
val_bpb
0.9796
Architecture
Transformer
Optimizer
—
Artifact Size
15,810,458 bytes
Training Techniques
Quantization
GPTQ-lite
bits: 6
scope: all
Compression
lzma
level: null
Evaluation
sliding window eval
parameters: {"stride":128}
Sequence Length
sequence_length
train_length: null
eval_length: 2560
Other
other
Lock-In Byte Mixer: high-confidence-only byte-level mixture of neural softmax with PPM-D byte conditional using a sigmoid gate.
parameters: {"alpha":25,"beta":0.9999,"ppm_order":5}
other
C2-correct byte marginalization over the SP8192 alphabet using canonical first-byte masking and chain-rule residuals.
parameters: null
other
AWQ-Lite quantization stack with activation-aware protection of salient weight groups.
parameters: {"group_top_k":1}
other
Canonical full validation-set evaluation over the 50k-doc CaseOps validation split.
parameters: {"token_count_per_seed":47851520,"canonical_byte_count_per_seed":164594398}
Novel Contributions
- Lock-In Byte Mixer with near-binary confidence gating
- C2-correct byte-level marginalization for SP8192
- Full canonical CaseOps validation-set evaluation
- AWQ-Lite-assisted quantized inference stack
- Sliding-window evaluation with stride 128 on a wider eval context