PR #902

open

Add classical doc-copy 16.3M lzma submission

by MuhtashamView on GitHub
val_bpb
1.8111
Architecture
Transformer
Optimizer
Artifact Size
15,705,009 bytes

Training Techniques

Architecture
doc_copy_ctx2
Document-local copy expert over a discounted hashed 4-gram backoff chain, with active scoring path effectively using doc_copy_ctx2 only.
parameters: {"doc_copy_contexts":2,"ngram_contexts":3}
Compression
lzma
level: null
Sequence Length
sequence_length
train_length: 16300000
eval_length: null
Other
other
Packed 10-bit follower token storage to reduce artifact size.
parameters: {"bits":10}

Novel Contributions

  • Document-local copy expert over a discounted hashed 4-gram backoff chain
  • Packed 10-bit follower token storage
  • lzma state compression to fit under the 16MB cap
  • Artifact-only evaluation on the official fineweb_val_* split
  • No training-shard access during final evaluation