PR #2007
openRecord: LongCtx No-QV QK5.25 + AsymLogit — 1.05899 BPB 3-seed mean
by ElubrazioneView on GitHub
val_bpb
1.0590
Architecture
Transformer
Optimizer
—
Artifact Size
15,992,777 bytes
Training Techniques
Architecture
SmearGate
BOS-fixed SmearGate used with sparse attention gating and skip gates.
parameters: {"bos_fixed":true}
weight tying
CaseOps/SP8192 model uses tied/shared embedding-style setup is not explicitly stated; no weight tying was clearly mentioned.
parameters: null
Quantization
GPTQ
bits: null
scope: mixed precision
mixed int7/int8
bits: null
scope: embeddings and model weights
Evaluation
long context eval
parameters: {"context_length":2560}
Test-Time Training
score-first TTT
parameters: {"masking":"No-QV","lora_rank":80,"local_lr_mult":0.75}
Sequence Length
sequence_length
train_length: 2560
eval_length: 2560
LR Schedule
warmdown
parameters: {"warmdown_frac":0.85,"min_lr":0.1}
Regularization
logit softcap
parameters: {"asymmetric_logit_rescale":true}
Compression
lrzip
level: null
Other
other
CaseOps/SP8192 tokenization with byte-sidecar BPB accounting.
parameters: null
other
Per-group lrzip compression with artifact-size checks on every clean seed.
parameters: null
Novel Contributions
- Long-context No-QV configuration with QK gain 5.25
- Asymmetric logit rescale at evaluation time
- Legal score-first TTT with No-QV masking
- Size-aware mixed-precision quantization and AWQ-lite protected quantization
- Three-seed clean rerun record with mean validation BPB 1.05899193
- CaseOps/SP8192 tokenization with byte-sidecar BPB accounting
- Per-group lrzip compression and artifact-size checks