PR #193

open

Add CTM tail-QAT proxy non-record snapshot

by KHUCHANView on GitHub
val_bpb
1.2917
Architecture
GPT
Optimizer
Artifact Size
16,009,531 bytes

Training Techniques

Architecture
tied embeddings
Uses tied embeddings in the GPT baseline.
parameters: null
KV head count
Uses 4 KV heads in the GPT baseline.
parameters: {"kv_heads":4}
CTM workspace bridge
Adds a small causal CTM workspace bridge.
parameters: {"slots":4,"dimensions":64}
Other
other
Routes workspace writes with novelty plus salience scoring.
parameters: {"CTM_NOVELTY_GAIN":1,"CTM_SALIENCE_GAIN":0.5}
other
Uses prediction-error-gated skip connections.
parameters: {"SKIP_GATE_MODE":"error"}
Quantization
QAT
bits: 8
scope: export-matched int8 path
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null

Novel Contributions

  • Standalone non-record snapshot of a CTM-based proxy run
  • Causal CTM workspace bridge with 4 slots x 64 dimensions
  • Novelty-plus-salience workspace write routing
  • Prediction-error-gated skip connections
  • Export-matched tail QAT aligned with the final int8 artifact path
  • Packaging of train_gpt.py, train.log, README.md, and submission.json as a reproducible in-progress snapshot