PR #1372

open

V20: Cascaded 2-Phase L-BFGS Causal SLOT (1.00497 BPB, 3-seed)

val_bpb

1.0050

Architecture

Transformer

Optimizer

L-BFGS

Artifact Size

15,854,022 bytes

Training Techniques

Architecture

BigramHash

Bigram hash embedding component in the backbone stack.

parameters: {"dimensions":3072,"window":112}

Quantization

GPTQ

bits: 6

scope: full Hessian model

Compression

lzma

level: null

brotli

level: null

Test-Time Training

L-BFGS Causal SLOT

parameters: {"history_size":20,"causal_mask":true}

Cascaded 2-Phase L-BFGS

parameters: {"phase1_iters":5,"phase1_history":10,"phase2_iters":18,"phase2_history":20,"history_reset_between_phases":true}

Discriminative per-block pre-quant TTT

parameters: {"graduated_lr":"0.3x->1.0x","layer_groups":10}

Sequence Length

sequence_length

train_length: 128

eval_length: null

Other

other

Causal optimization mask restricted to already-scored tokens only.

parameters: {"opt_mask_range":"[focal_start, s)"}