PR #1443

open

Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)

by hardik-bhadani-gitView on GitHub
val_bpb
1.3496
Architecture
Transformer
Optimizer
Artifact Size
15,132,832 bytes

Training Techniques

Architecture
byte-level input
Uses raw UTF-8 bytes with no tokenizer and vocab_size=256.
parameters: {"vocab_size":256}
predictor MLP
Adds a 2-layer MLP predictor to map hidden states to target hidden states.
parameters: {"layers":2}
Other
other
True JEPA training objective that predicts future hidden states instead of next-byte/token prediction.
parameters: null
other
Three-stage training schedule: pure JEPA pretraining, linear bridge from JEPA to cross-entropy, then pure cross-entropy fine-tuning.
parameters: {"stages":3,"fractions":[0.1,0.7,0.2]}
Regularization
SIGReg
parameters: null
Weight Averaging
SWA
parameters: null
Compression
zlib
level: null
Quantization
int8
bits: 8
scope: all

Novel Contributions

  • True byte-level JEPA with no tokenizer
  • Representation-prediction training objective instead of token prediction
  • Three-stage JEPA-to-CE training pipeline
  • SIGReg anti-collapse regularizer replacing EMA target encoder
  • Post-quantized int8+zlib submission under the 16MB limit