PR #1922
openJEPA Implementation Path: Add Non-Record 10-Minute SP8192 BPE Submission with Self-Contained Data Setup
by divagr18View on GitHub
val_bpb
1.1875
Architecture
Transformer
Optimizer
—
Artifact Size
15,281,273 bytes
Training Techniques
Architecture
weight tying
Tied input and output embeddings.
parameters: null
GQA
Uses grouped-query attention with fewer KV heads than query heads.
parameters: {"num_heads":8,"num_kv_heads":4}
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Evaluation
sliding window eval
parameters: {"stride_frac":0.5}
Quantization
GPTQ
bits: 8
scope: all
Compression
zstd
level: null
Weight Averaging
EMA
parameters: null
Other
other
JEPA-style latent prediction over repeated mid-depth states with gated residual integration and EMA-teacher distillation scheduling.
parameters: {"jepa_enabled":true,"distill_enabled":true,"apply_every":null,"delayed_activation":true}
Novel Contributions
- JEPA-style latent predictive objective over repeated mid-depth states
- EMA-teacher distillation with delayed activation
- Gated residual integration of predicted latent states
- Self-contained record-local SP8192 dataset setup script
- Roundtrip-compatible int8 + zstd submission artifact under 16MB