PR #1168

open

Notable Non-Record: H-Net Dynamic Chunking — 1.3639 BPB — Learned Content-Dependent Segmentation

by gowtham0992View on GitHub

val_bpb

1.3639

Architecture

Transformer

Optimizer

—

Artifact Size

10.39 MB

Training Techniques

Architecture

BigramHash

Uses BigramHash at each resolution as part of the model stack.

parameters: null

SmearGate

Uses SmearGate at each resolution as part of the model stack.

parameters: null

U-Net skip connections

Residual skip connection from encoder output to decoder output to preserve fine-grained information.

parameters: null

hierarchical chunking

Learns content-dependent chunk boundaries and performs dynamic chunking/downsampling and upsampling.

parameters: {"target_ratio":2}

attention modifications

Routing module uses cosine similarity between adjacent representations with learned thresholding for boundary detection.

parameters: null

multi-resolution processing

Processes encoder at compressed resolution and decoder at full resolution.

parameters: {"encoder_layers":5,"decoder_layers":6,"d_model":512}

Weight Averaging

EMA

parameters: null

Quantization

GPTQ

bits: 6

scope: model

Evaluation

sliding window eval

parameters: null

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

Regularization

ratio loss

parameters: {"alpha":0.03,"target_ratio":2}

Novel Contributions

Dynamic chunking with learned content-dependent segmentation
Cosine-similarity routing for boundary detection
EMA smoothing for chunk interpolation
Confidence-weighted upsampling with STE
Residual skip connection between encoder and decoder
Ratio loss to guide compression toward a target ratio
Multi-resolution hierarchical processing adapted to token-level competition constraints