val_bpb
0.4924
Architecture
Transformer
Optimizer
—
Artifact Size
13,125,365 bytes
Training Techniques
Quantization
int8
bits: 8
scope: model artifact
Architecture
weight tying
Baseline adaptation uses tied embeddings / weight tying as part of the OAI-baseline style model.
parameters: null
BigramHash
ConvexTok-2048 deterministic tokenizer uses byte-boundary tokenization DAG with LP/token-rank/price/byte-length features.
parameters: {"vocab_size":2048}
Gated Attention
Graphified adaptation injects token graph structure into the baseline hidden stream with causal edge tokens and additional channels.
parameters: null
RoPE
Toric phase features are injected into the graphified token stream.
parameters: null
Test-Time Training
score-first TTT
parameters: {"non_committing":true}
Other
other
ConvexTok-2048 deterministic tokenizer with byte-boundary tokenization DAG and token-rank/price/byte-length features.
parameters: {"vocab_size":2048}
other
TokenGT-style FineWeb graphification with token nodes, causal edge tokens, distance/endpoint/identifier channels, and toric phase features.
parameters: null
other
OAI-FineWeb-only flattening of graph-output states back to the tokenizer sequence for BPB scoring.
parameters: null
other
Embedding-space GFlowNet and Forest-of-Thought heads used during training to encourage reasoning forests and memory retrieval.
parameters: null
other
Low-weight advanced sidecar losses including GraphCG, analogical memory, TokenGT graph supervision, toric geometry, vector-bundle 1D-cone/sheaf, Toric BGG category-O, Koszul persistence, combinatorial toric commutative algebra, and derived signatures.
parameters: null
Novel Contributions
- ConvexTok-2048 deterministic tokenizer
- Graphified OAI-baseline adaptation for FineWeb
- TokenGT-style graphification with toric phase features
- Embedding-space GFlowNet and Forest-of-Thought training heads
- Self-contained non-record packaging with tokenizer and minimal ToricGT dependency modules
- Best-of-10 sweep selecting the lowest int8 BPB run