PR #1201

closed

records: add 2026-03-27 TrigramHash run, harden quantization safety, and clean docs

by Mister2005View on GitHub
val_bpb
1.6371
Architecture
Transformer
Optimizer
Adam
Artifact Size
5909270 bytes

Training Techniques

Architecture
TrigramHash
Adds trigram hash embeddings wired into the GPT forward and logits paths.
parameters: {"vocab_size":4096,"dimension":128}
Value Residual
Uses value residual blending with v0 carry-over in causal self-attention.
parameters: null
Quantization
QAT
bits: 6
scope: bank slices (Q/K/V/O and MLP up/down)
mixed int6/int7/int5
bits: null
scope: bank tensors
Test-Time Training
Adam TTT
parameters: {"enabled":false}
Evaluation
sliding window eval
parameters: {"stride":64}
Compression
lzma
level: 9
LR Schedule
warmdown
parameters: {"warmdown_iters":4000}

Novel Contributions

  • TrigramHash embeddings integrated into the GPT forward and logits paths
  • Value Residual mechanism in causal self-attention
  • Bank-level QAT with late enabling and torch.compile recompile path
  • GradQuant tiered mixed quantization with rebank/unbank export path
  • Multi-token prediction heads trained but excluded from export size
  • Legal score-first Adam-based TTT support
  • Temperature calibration using training tokens only
  • Extended warmdown and lzma preset 9 for final artifact compression