PR #1634

open

Non-record: Verifily Three-Tier Token Weighting + DCLS Salience (SP1024, 1.1335 BPB)

by arsenis-cmdView on GitHub
val_bpb
1.1335
Architecture
Transformer
Optimizer
Parallel Muon
Artifact Size
15.9MB

Training Techniques

Quantization
GPTQ
bits: 6
scope: all
Architecture
BigramHash
GPU-resident bigram statistics used for token weighting and eval-time mixing
parameters: {"size":2048,"dimensions":128}
XSA
Base architecture includes XSA-all
parameters: null
Optimizer
Parallel Muon
weight_decay: null
momentum: null
other_params: {"adam":true}
Compression
lzma
level: null
Evaluation
sliding window eval
parameters: {"stride":64}
Regularization
weight decay
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Other
other
Three-tier token weighting using predictable/frontier/noise tiers based on bigram frequency and document quality
parameters: {"weights":{"predictable":0.1,"frontier":1,"noise":0.7}}
other
DCLS salience batch reweighting based on surprise signal and document quality
parameters: {"loss_multiplier_range":[0.85,1.15]}
other
Quality-conditioned bigram mixer at evaluation
parameters: {"alpha_high_quality":0.15,"alpha_low_quality":0.3}

Novel Contributions

  • Three-tier token weighting with predictable/frontier/noise classes
  • GPU-resident bigram statistics for token-level loss weighting
  • Document-quality scoring from vocabulary richness and repetition
  • DCLS salience batch reweighting
  • Quality-conditioned bigram mixer at evaluation
  • Pure data-quality approach with zero architectural changes