val_bpb
1.1995
Architecture
Transformer
Optimizer
—
Artifact Size
15,933,037 bytes
Training Techniques
Architecture
weight tying
Standard backbone uses tied embeddings / tied weights as part of the baseline SP-1024 model.
parameters: null
Evaluation
BOS-reset non-overlap eval
parameters: {"window":1024,"stride":1024}
stride-based eval
parameters: {"stride":1024}
Other
other
Document-local exact-memory scorer that builds causal local memory from already-scored tokens and routes scoring through a compact probe.
parameters: {"exact_causal_3gram":true,"bounded_exact_local_repeat":true,"repeat_match_length":"4-8","top_k":3,"min_support":2,"alpha":0.3}
Sequence Length
sequence_length
train_length: null
eval_length: 1024
Compression
zlib
level: null
Novel Contributions
- Exact-memory probe layered on top of a standard 9L/512d SP-1024 backbone
- Causal document-local 3-gram memory with bounded exact repeat memory
- Compact two-level uplift probe that decides when to trust memory over the neural model
- BOS-reset non-overlap evaluation regime
- Artifact-backed alpha sweep selecting alpha=0.30
- Improved performance concentrated on long documents