PR #1168
openNotable Non-Record: H-Net Dynamic Chunking — 1.3639 BPB — Learned Content-Dependent Segmentation
by gowtham0992View on GitHub
val_bpb
1.3639
Architecture
Transformer
Optimizer
—
Artifact Size
10.39 MB
Training Techniques
Architecture
BigramHash
Uses BigramHash at each resolution as part of the model stack.
parameters: null
SmearGate
Uses SmearGate at each resolution as part of the model stack.
parameters: null
U-Net skip connections
Residual skip connection from encoder output to decoder output to preserve fine-grained information.
parameters: null
hierarchical chunking
Learns content-dependent chunk boundaries and performs dynamic chunking/downsampling and upsampling.
parameters: {"target_ratio":2}
attention modifications
Routing module uses cosine similarity between adjacent representations with learned thresholding for boundary detection.
parameters: null
multi-resolution processing
Processes encoder at compressed resolution and decoder at full resolution.
parameters: {"encoder_layers":5,"decoder_layers":6,"d_model":512}
Weight Averaging
EMA
parameters: null
Quantization
GPTQ
bits: 6
scope: model
Evaluation
sliding window eval
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Regularization
ratio loss
parameters: {"alpha":0.03,"target_ratio":2}
Novel Contributions
- Dynamic chunking with learned content-dependent segmentation
- Cosine-similarity routing for boundary detection
- EMA smoothing for chunk interpolation
- Confidence-weighted upsampling with STE
- Residual skip connection between encoder and decoder
- Ratio loss to guide compression toward a target ratio
- Multi-resolution hierarchical processing adapted to token-level competition constraints