PR #1121
openNotable Non-Record: H-Net Tokenization — 0.6846 BPB — Hierarchical Token Processing
by gowtham0992View on GitHub
val_bpb
0.6846
Architecture
Transformer
Optimizer
—
Artifact Size
14.00 MB
Training Techniques
Architecture
BigramHash
Uses BigramHash at each resolution as part of the transformer stack.
parameters: null
SmearGate
Uses SmearGate at each resolution as part of the transformer stack.
parameters: null
U-Net skip connections
Adapts skip connections across coarse and fine resolutions by unmerging before skip-add.
parameters: null
hierarchical token processing
Introduces learned token merge/unmerge to process tokens at multiple resolutions within the transformer.
parameters: {"merge_factor":2,"encoder_layers":5,"decoder_layers":6,"d_model":512}
Quantization
GPTQ
bits: 6
scope: model
Evaluation
sliding window eval
parameters: null
Novel Contributions
- Learned token merge/unmerge hierarchy for multi-resolution processing
- Gated merge with softmax weights over adjacent token pairs
- Per-position unmerge projections to reconstruct fine-grained tokens
- Half-resolution encoder followed by full-resolution decoder
- Resolution-aware skip connection handling across the merge/unmerge boundary
- Sliding window evaluation with GPTQ-quantized artifact