PR #1243

open

JEPArdy! Non-Record Submission - JEPA + Leader-Stack - val_bpb 1.1230

by simon-marcusView on GitHub

val_bpb

1.1230

Architecture

Transformer

Optimizer

—

Artifact Size

16MB

Training Techniques

Architecture

LeakyReLU

Uses LeakyReLU(0.5)^2 in the model stack.

parameters: {"slope":0.5}

Weight Averaging

EMA

parameters: null

Regularization

weight decay

parameters: null

Quantization

int6

bits: 6

scope: attn, mlp, embed, other floating tensors

Evaluation

sliding window eval

parameters: {"TTT_ENABLED":0}

Test-Time Training

TTT

parameters: {"enabled":0}

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

Other

other

JEPA auxiliary loss used during training with a tuned loss weight of 0.10.

parameters: {"jepa_loss_weight":0.1}

Compression

custom

level: null

JEPA auxiliary loss integrated into a leader-family stack and validated by ablation.
Selection of JEPA_LOSS_WEIGHT=0.10 based on longer-horizon validation rather than short screens.
Storage-only export pass that removes duplicate top-level JEPA alias weights.
Post-training int6 quantization of selected floating tensors for artifact-size reduction.
Use of full-model EMA for export/eval stability.