PR #1243

open

JEPArdy! Non-Record Submission - JEPA + Leader-Stack - val_bpb 1.1230

by simon-marcusView on GitHub
val_bpb
1.1230
Architecture
Transformer
Optimizer
Artifact Size
16MB

Training Techniques

Architecture
LeakyReLU
Uses LeakyReLU(0.5)^2 in the model stack.
parameters: {"slope":0.5}
Weight Averaging
EMA
parameters: null
Regularization
weight decay
parameters: null
Quantization
int6
bits: 6
scope: attn, mlp, embed, other floating tensors
Evaluation
sliding window eval
parameters: {"TTT_ENABLED":0}
Test-Time Training
TTT
parameters: {"enabled":0}
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Other
other
JEPA auxiliary loss used during training with a tuned loss weight of 0.10.
parameters: {"jepa_loss_weight":0.1}
Compression
custom
level: null

Novel Contributions

  • JEPA auxiliary loss integrated into a leader-family stack and validated by ablation.
  • Selection of JEPA_LOSS_WEIGHT=0.10 based on longer-horizon validation rather than short screens.
  • Storage-only export pass that removes duplicate top-level JEPA alias weights.
  • Post-training int6 quantization of selected floating tensors for artifact-size reduction.
  • Use of full-model EMA for export/eval stability.