PR #1121

open

Notable Non-Record: H-Net Tokenization — 0.6846 BPB — Hierarchical Token Processing

by gowtham0992View on GitHub
val_bpb
0.6846
Architecture
Transformer
Optimizer
Artifact Size
14.00 MB

Training Techniques

Architecture
BigramHash
Uses BigramHash at each resolution as part of the transformer stack.
parameters: null
SmearGate
Uses SmearGate at each resolution as part of the transformer stack.
parameters: null
U-Net skip connections
Adapts skip connections across coarse and fine resolutions by unmerging before skip-add.
parameters: null
hierarchical token processing
Introduces learned token merge/unmerge to process tokens at multiple resolutions within the transformer.
parameters: {"merge_factor":2,"encoder_layers":5,"decoder_layers":6,"d_model":512}
Quantization
GPTQ
bits: 6
scope: model
Evaluation
sliding window eval
parameters: null

Novel Contributions

  • Learned token merge/unmerge hierarchy for multi-resolution processing
  • Gated merge with softmax weights over adjacent token pairs
  • Per-position unmerge projections to reconstruct fine-grained tokens
  • Half-resolution encoder followed by full-resolution decoder
  • Resolution-aware skip connection handling across the merge/unmerge boundary
  • Sliding window evaluation with GPTQ-quantized artifact