val_bpb
1.5406
Architecture
Transformer
Optimizer
—
Artifact Size
—
Training Techniques
Architecture
BigramHash
Hashes consecutive token pairs into a learned embedding table and gates them into the input representation to inject bigram statistics.
parameters: null
depth recurrence
Uses a Belief Recurrent Network (BRN) that maintains a persistent belief state across the sequence and gates representation updates.
parameters: null
Other
other
Geodesic Topological Tokenizer built from persistent homology on n-gram co-occurrence manifolds, using Isomap projection and topologically stable clustering.
parameters: null
Novel Contributions
- Geodesic Topological Tokenizer based on persistent homology and Isomap
- BigramHash for injecting learned bigram priors
- Belief Recurrent Network (BRN) with persistent belief state
- Reported tokenizer compression advantage of 3.236 chars/token