PR #1571

open

[Non-Record] Geodesic Topological Tokenizer + BigramHash + BRN

by skar07View on GitHub

val_bpb

1.5406

Architecture

Transformer

Optimizer

—

Artifact Size

—

Training Techniques

Architecture

BigramHash

Hashes consecutive token pairs into a learned embedding table and gates them into the input representation to inject bigram statistics.

parameters: null

depth recurrence

Uses a Belief Recurrent Network (BRN) that maintains a persistent belief state across the sequence and gates representation updates.

parameters: null

Other

other

Geodesic Topological Tokenizer built from persistent homology on n-gram co-occurrence manifolds, using Isomap projection and topologically stable clustering.

parameters: null

Novel Contributions

Geodesic Topological Tokenizer based on persistent homology and Isomap
BigramHash for injecting learned bigram priors
Belief Recurrent Network (BRN) with persistent belief state
Reported tokenizer compression advantage of 3.236 chars/token