PR #1534
openSP4096 + Depth Recurrence + Parallel Residuals + Legal N-Gram
by someone114514View on GitHub
val_bpb
1.0846
Architecture
Transformer
Optimizer
—
Artifact Size
15,967,527 bytes
Training Techniques
Architecture
depth recurrence
Recurrent / parallel-residual SP4096 stack with depth recurrence.
parameters: null
parallel residuals
Uses parallel-residual stack in the base model.
parameters: null
Weight Averaging
EMA
parameters: null
Quantization
int6
bits: 6
scope: all
Evaluation
sliding window eval
parameters: null
Other
other
Legal prefix-only n-gram overlay with token / within-word / word-start experts, one-token logit tilt, and full-vocab renormalization during evaluation.
parameters: null
Novel Contributions
- Adds a separate prefix-only legal n-gram evaluation path to the SP4096 recurrent / parallel-residual base
- Uses token, within-word continuation, and word-start experts built from already-seen tokens
- Applies a one-token bias and renormalizes over the full vocabulary in a single left-to-right pass
- Keeps evaluation legal with no target-conditioned gating, no two-pass rescoring, and no weight updates during inference
- Reports a best result of 1.08457715 val_bpb