PR #2017

open

Non-record: Use-Theoretic Embeddings (Wittgenstein PI §293)

by ArmigerousView on GitHub

val_bpb

1.0611

Architecture

Transformer

Optimizer

Muon

Artifact Size

—

Training Techniques

Architecture

weight tying

Replaces the static tied embedding with a seed table plus hypernetworks that generate input embeddings and output projection weights from token seeds and causal context.

parameters: {"d_seed":64,"d_hidden":256,"ctx_window":3}

other

Use-theoretic embedding layer with causal context-conditioned input hypernetwork and regenerated output projection.

parameters: {"vocab_size":8192,"d_model":512}

BigramHash

Referenced as part of the existing stack and prior work comparison.

parameters: null

SmearGate

Referenced as part of the existing PR #1855 stack.

parameters: null

XSA

Referenced as part of the existing PR #1855 stack.

parameters: null

Gated Attention

Referenced as part of the existing PR #1855 stack.

parameters: null

weight tying

The submission discusses replacing the tied embedding table and regenerating output weights from the seed table.

parameters: null

Quantization

GPTQ

bits: null

scope: hypernet weights and tensors without Hessians

fp16

bits: 16

scope: fallback for tensors without registered Hessians

Test-Time Training

LoRA TTT

parameters: null

Novel Contributions

Replaces the static V×d tied embedding with a smaller V×d_seed seed table plus hypernetworks.
Introduces a causal, context-conditioned input embedding generated at use-time.
Regenerates output projection weights from the same seed table instead of storing a full static embedding matrix.
Claims a 5.02× parameter reduction for the embedding component (4,194,304 → 835,584).
Finds and fixes three integration bugs via static analysis before GPU execution.
Pre-registers matched-budget and matched-architecture ablation conditions for evaluation.