PR #1443

open

Non-record: ByteJEPA — True Byte-Level JEPA (val_bpb 1.3496)

by hardik-bhadani-gitView on GitHub

val_bpb

1.3496

Architecture

Transformer

Optimizer

—

Artifact Size

15,132,832 bytes

Training Techniques

Architecture

byte-level input

Uses raw UTF-8 bytes with no tokenizer and vocab_size=256.

parameters: {"vocab_size":256}

predictor MLP

Adds a 2-layer MLP predictor to map hidden states to target hidden states.

parameters: {"layers":2}

Other

other

True JEPA training objective that predicts future hidden states instead of next-byte/token prediction.

parameters: null

other

Three-stage training schedule: pure JEPA pretraining, linear bridge from JEPA to cross-entropy, then pure cross-entropy fine-tuning.

parameters: {"stages":3,"fractions":[0.1,0.7,0.2]}

Regularization

SIGReg

parameters: null

Weight Averaging

SWA

parameters: null

Compression

zlib

level: null

Quantization

int8

bits: 8

scope: all