PR #1191

open

Non-record: H-Net Dynamic Chunking — Learned Tokenization Layer (val_bpb 1.3587)

val_bpb

1.3587

Architecture

Transformer

Optimizer

Muon

Artifact Size

8,592,061

Training Techniques

Architecture

SmearGate

Uses a SmearGate layer in the model pipeline after the learned chunking layer.

parameters: null

BigramHash

Uses bigram embeddings alongside token embeddings at the input.

parameters: null

DynamicChunker

Learned dynamic chunking layer that predicts soft boundaries between adjacent token embeddings and blends neighbors when boundaries are low.

parameters: {"layers":1}

U-Net skip connections

Chunker is inserted before skip connection setup / transformer blocks, implying modified skip-connection flow in the baseline.

parameters: null

Optimizer

Muon

weight_decay: null

momentum: null

other_params: null

Compression

zlib

level: null

Test-Time Training

full TTT

parameters: null

Introduces a learned dynamic chunking layer inspired by H-Net.
Implements differentiable soft boundary prediction and neighbor blending for token embeddings.
Demonstrates a lightweight learned tokenization-style preprocessing layer with minimal parameter overhead.
Provides a control path where disabling the chunker reproduces baseline behavior.