PR #193

open

Add CTM tail-QAT proxy non-record snapshot

val_bpb

1.2917

Architecture

GPT

Optimizer

—

Artifact Size

16,009,531 bytes

Training Techniques

Architecture

tied embeddings

Uses tied embeddings in the GPT baseline.

parameters: null

KV head count

Uses 4 KV heads in the GPT baseline.

parameters: {"kv_heads":4}

CTM workspace bridge

Adds a small causal CTM workspace bridge.

parameters: {"slots":4,"dimensions":64}

Other

other

Routes workspace writes with novelty plus salience scoring.

parameters: {"CTM_NOVELTY_GAIN":1,"CTM_SALIENCE_GAIN":0.5}

other

Uses prediction-error-gated skip connections.

parameters: {"SKIP_GATE_MODE":"error"}

Quantization

QAT

bits: 8

scope: export-matched int8 path

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Standalone non-record snapshot of a CTM-based proxy run
Causal CTM workspace bridge with 4 slots x 64 dimensions
Novelty-plus-salience workspace write routing
Prediction-error-gated skip connections
Export-matched tail QAT aligned with the final int8 artifact path
Packaging of train_gpt.py, train.log, README.md, and submission.json as a reproducible in-progress snapshot