PR #1168

open

Notable Non-Record: H-Net Dynamic Chunking — 1.3639 BPB — Learned Content-Dependent Segmentation

by gowtham0992View on GitHub
val_bpb
1.3639
Architecture
Transformer
Optimizer
Artifact Size
10.39 MB

Training Techniques

Architecture
BigramHash
Uses BigramHash at each resolution as part of the model stack.
parameters: null
SmearGate
Uses SmearGate at each resolution as part of the model stack.
parameters: null
U-Net skip connections
Residual skip connection from encoder output to decoder output to preserve fine-grained information.
parameters: null
hierarchical chunking
Learns content-dependent chunk boundaries and performs dynamic chunking/downsampling and upsampling.
parameters: {"target_ratio":2}
attention modifications
Routing module uses cosine similarity between adjacent representations with learned thresholding for boundary detection.
parameters: null
multi-resolution processing
Processes encoder at compressed resolution and decoder at full resolution.
parameters: {"encoder_layers":5,"decoder_layers":6,"d_model":512}
Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: 6
scope: model
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Regularization
ratio loss
parameters: {"alpha":0.03,"target_ratio":2}

Novel Contributions

  • Dynamic chunking with learned content-dependent segmentation
  • Cosine-similarity routing for boundary detection
  • EMA smoothing for chunk interpolation
  • Confidence-weighted upsampling with STE
  • Residual skip connection between encoder and decoder
  • Ratio loss to guide compression toward a target ratio
  • Multi-resolution hierarchical processing adapted to token-level competition constraints