PR #1243
openJEPArdy! Non-Record Submission - JEPA + Leader-Stack - val_bpb 1.1230
by simon-marcusView on GitHub
val_bpb
1.1230
Architecture
Transformer
Optimizer
—
Artifact Size
16MB
Training Techniques
Architecture
LeakyReLU
Uses LeakyReLU(0.5)^2 in the model stack.
parameters: {"slope":0.5}
Weight Averaging
EMA
parameters: null
Regularization
weight decay
parameters: null
Quantization
int6
bits: 6
scope: attn, mlp, embed, other floating tensors
Evaluation
sliding window eval
parameters: {"TTT_ENABLED":0}
Test-Time Training
TTT
parameters: {"enabled":0}
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Other
other
JEPA auxiliary loss used during training with a tuned loss weight of 0.10.
parameters: {"jepa_loss_weight":0.1}
Compression
custom
level: null
Novel Contributions
- JEPA auxiliary loss integrated into a leader-family stack and validated by ablation.
- Selection of JEPA_LOSS_WEIGHT=0.10 based on longer-horizon validation rather than short screens.
- Storage-only export pass that removes duplicate top-level JEPA alias weights.
- Post-training int6 quantization of selected floating tensors for artifact-size reduction.
- Use of full-model EMA for export/eval stability.