PR #1191
openNon-record: H-Net Dynamic Chunking — Learned Tokenization Layer (val_bpb 1.3587)
by dentity007View on GitHub
val_bpb
1.3587
Architecture
Transformer
Optimizer
Muon
Artifact Size
8,592,061
Training Techniques
Architecture
SmearGate
Uses a SmearGate layer in the model pipeline after the learned chunking layer.
parameters: null
BigramHash
Uses bigram embeddings alongside token embeddings at the input.
parameters: null
DynamicChunker
Learned dynamic chunking layer that predicts soft boundaries between adjacent token embeddings and blends neighbors when boundaries are low.
parameters: {"layers":1}
U-Net skip connections
Chunker is inserted before skip connection setup / transformer blocks, implying modified skip-connection flow in the baseline.
parameters: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: null
Compression
zlib
level: null
Test-Time Training
full TTT
parameters: null
Novel Contributions
- Introduces a learned dynamic chunking layer inspired by H-Net.
- Implements differentiable soft boundary prediction and neighbor blending for token embeddings.
- Demonstrates a lightweight learned tokenization-style preprocessing layer with minimal parameter overhead.
- Provides a control path where disabling the chunker reproduces baseline behavior.