PR #1816

open

Draft non-record: qk525 nopqttt clip1500 BQ11 legal score-first TTT, pending 8×H100 multi-seed validation

by JiaJunDeng5930View on GitHub
val_bpb
1.3915
Architecture
Transformer
Optimizer
Artifact Size
15,317,495 bytes

Training Techniques

Quantization
GPTQ
bits: 6
scope: weights and embeddings
int8
bits: 8
scope: embeddings
Weight Averaging
EMA
parameters: null
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":2}
Evaluation
stride-based eval
parameters: {"stride":512}
Sequence Length
sequence_length
train_length: 32768
eval_length: null
Regularization
logit softcap
parameters: {"sigma":15}
Compression
brotli
level: 11
Architecture
weight tying
Tied embeddings / embedding tying implied by the submission naming and configuration context.
parameters: null

Novel Contributions

  • No pre-quant TTT
  • Clip1500 BQ11 configuration
  • Legal score-first TTT
  • 8xH100 dryrun validation path
  • GPTQ-based int6/int8 compression setup
  • Stride-based evaluation with 512-token stride