PR #1816
openDraft non-record: qk525 nopqttt clip1500 BQ11 legal score-first TTT, pending 8×H100 multi-seed validation
by JiaJunDeng5930View on GitHub
val_bpb
1.3915
Architecture
Transformer
Optimizer
—
Artifact Size
15,317,495 bytes
Training Techniques
Quantization
GPTQ
bits: 6
scope: weights and embeddings
int8
bits: 8
scope: embeddings
Weight Averaging
EMA
parameters: null
Test-Time Training
score-first TTT
parameters: {"learning_rate":0.005,"epochs":2}
Evaluation
stride-based eval
parameters: {"stride":512}
Sequence Length
sequence_length
train_length: 32768
eval_length: null
Regularization
logit softcap
parameters: {"sigma":15}
Compression
brotli
level: 11
Architecture
weight tying
Tied embeddings / embedding tying implied by the submission naming and configuration context.
parameters: null
Novel Contributions
- No pre-quant TTT
- Clip1500 BQ11 configuration
- Legal score-first TTT
- 8xH100 dryrun validation path
- GPTQ-based int6/int8 compression setup
- Stride-based evaluation with 512-token stride