PR #1634

open

Non-record: Verifily Three-Tier Token Weighting + DCLS Salience (SP1024, 1.1335 BPB)

by arsenis-cmdView on GitHub

val_bpb

1.1335

Architecture

Transformer

Optimizer

Parallel Muon

Artifact Size

15.9MB

Training Techniques

Quantization

GPTQ

bits: 6

scope: all

Architecture

BigramHash

GPU-resident bigram statistics used for token weighting and eval-time mixing

parameters: {"size":2048,"dimensions":128}

XSA

Base architecture includes XSA-all

parameters: null

Optimizer

Parallel Muon

weight_decay: null

momentum: null

other_params: {"adam":true}

Compression

lzma

level: null

Evaluation

sliding window eval

parameters: {"stride":64}

Regularization

weight decay

parameters: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Other

other

Three-tier token weighting using predictable/frontier/noise tiers based on bigram frequency and document quality

parameters: {"weights":{"predictable":0.1,"frontier":1,"noise":0.7}}

other

DCLS salience batch reweighting based on surprise signal and document quality

parameters: {"loss_multiplier_range":[0.85,1.15]}

other

Quality-conditioned bigram mixer at evaluation

parameters: {"alpha_high_quality":0.15,"alpha_low_quality":0.3}

Novel Contributions

Three-tier token weighting with predictable/frontier/noise classes
GPU-resident bigram statistics for token-level loss weighting
Document-quality scoring from vocabulary richness and repetition
DCLS salience batch reweighting
Quality-conditioned bigram mixer at evaluation
Pure data-quality approach with zero architectural changes