PR #1124

open

Record: 1.1194 BPB — v9 Batched Muon + Full GPTQ Random Calib + JEPA Research

val_bpb

1.1194

Architecture

Transformer

Optimizer

Muon

Artifact Size

15.90 MB

Training Techniques

Optimizer

Muon

weight_decay: null

momentum: null

other_params: {"batched_newton_schulz_orthogonalization":true,"torch_bmm":true,"shape_matched_batches":4,"weight_matrices_grouped":66}

Quantization

GPTQ

bits: null

scope: full model

Other

other

Random token calibration for GPTQ to collect Hessians without training data access

parameters: null

Architecture

XSA

Uses XSA across all layers

parameters: {"last_n":11}

Evaluation

sliding window eval

parameters: {"stride":64}

Regularization

label smoothing

parameters: {"value":0}

Test-Time Training

score-first TTT

parameters: null