PR #2168

open

ToricGT ConvexTok-2048 graphified FoT non-record submission

by amelie-iskaView on GitHub

val_bpb

0.4924

Architecture

Transformer

Optimizer

—

Artifact Size

13,125,365 bytes

Training Techniques

Quantization

int8

bits: 8

scope: model artifact

Architecture

weight tying

Baseline adaptation uses tied embeddings / weight tying as part of the OAI-baseline style model.

parameters: null

BigramHash

ConvexTok-2048 deterministic tokenizer uses byte-boundary tokenization DAG with LP/token-rank/price/byte-length features.

parameters: {"vocab_size":2048}

Gated Attention

Graphified adaptation injects token graph structure into the baseline hidden stream with causal edge tokens and additional channels.

parameters: null

RoPE

Toric phase features are injected into the graphified token stream.

parameters: null

Test-Time Training

score-first TTT

parameters: {"non_committing":true}

Other

other

ConvexTok-2048 deterministic tokenizer with byte-boundary tokenization DAG and token-rank/price/byte-length features.

parameters: {"vocab_size":2048}

other

TokenGT-style FineWeb graphification with token nodes, causal edge tokens, distance/endpoint/identifier channels, and toric phase features.

parameters: null

other

OAI-FineWeb-only flattening of graph-output states back to the tokenizer sequence for BPB scoring.

parameters: null

other

Embedding-space GFlowNet and Forest-of-Thought heads used during training to encourage reasoning forests and memory retrieval.

parameters: null

other

Low-weight advanced sidecar losses including GraphCG, analogical memory, TokenGT graph supervision, toric geometry, vector-bundle 1D-cone/sheaf, Toric BGG category-O, Koszul persistence, combinatorial toric commutative algebra, and derived signatures.

parameters: null

Novel Contributions

ConvexTok-2048 deterministic tokenizer
Graphified OAI-baseline adaptation for FineWeb
TokenGT-style graphification with toric phase features
Embedding-space GFlowNet and Forest-of-Thought training heads
Self-contained non-record packaging with tokenizer and minimal ToricGT dependency modules
Best-of-10 sweep selecting the lowest int8 BPB run