PR #762

closed

Record: LeakyReLU(0.5)² + Legal Per-Document LoRA TTT + GPTQ-lite (mean val_bpb=0.7139, 3 seeds)

val_bpb

0.7139

Architecture

—

Optimizer

—

Artifact Size

15.8MB

Training Techniques

Test-Time Training

LoRA TTT

parameters: {"rank":16,"epochs":5,"min_doc_len":256,"score_before_train":true,"per_document_accumulators":true}

Quantization

GPTQ-lite

bits: 6

scope: all

Architecture

LeakyReLU

Configurable LeakyReLU slope used in the model, with slope defaulting to 0.5

parameters: {"slope":0.5}

Other

other

Per-document TTT scoring fix that scores each token before LoRA trains on it within each epoch, with accumulators reset at epoch boundaries

parameters: {"legal_scoring":true,"multi_epoch_caveat":true}

Legal per-document TTT scoring that scores tokens before training within each epoch
GPTQ-lite multi-percentile int6 quantization with minimum-MSE clipping per row
Extended TTT budget with LoRA rank 16, 5 epochs, and shorter minimum document length
Configurable LeakyReLU slope via environment variable
Reported both a multi-epoch high-performing configuration and a clearly legal single-epoch baseline