PR #1443
openNon-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)
by hardik-bhadani-gitView on GitHub
val_bpb
1.3496
Architecture
Transformer
Optimizer
—
Artifact Size
15,132,832 bytes
Training Techniques
Architecture
byte-level input
Uses raw UTF-8 bytes with no tokenizer and vocab_size=256.
parameters: {"vocab_size":256}
predictor MLP
Adds a 2-layer MLP predictor to map hidden states to target hidden states.
parameters: {"layers":2}
Other
other
True JEPA training objective that predicts future hidden states instead of next-byte/token prediction.
parameters: null
other
Three-stage training schedule: pure JEPA pretraining, linear bridge from JEPA to cross-entropy, then pure cross-entropy fine-tuning.
parameters: {"stages":3,"fractions":[0.1,0.7,0.2]}
Regularization
SIGReg
parameters: null
Weight Averaging
SWA
parameters: null
Compression
zlib
level: null
Quantization
int8
bits: 8
scope: all
Novel Contributions
- True byte-level JEPA with no tokenizer
- Representation-prediction training objective instead of token prediction
- Three-stage JEPA-to-CE training pipeline
- SIGReg anti-collapse regularizer replacing EMA target encoder
- Post-quantized int8+zlib submission under the 16MB limit