PR #1152

open

Non-record: Connectome-JEPA — Sparse I/O Bottleneck (val_bpb 1.7942 ± 0.003, 1.97MB artifact, first JEPA submission)

val_bpb

1.7942

Architecture

Transformer

Optimizer

—

Artifact Size

1.97MB

Training Techniques

Architecture

weight tying

Input embedding and output projection share weights.

parameters: null

RoPE

Uses rotary positional embeddings in causal cross-position attention.

parameters: null

U-Net skip connections

Gated sensory re-injection every 2 sparse hops to preserve gradient flow through the sparse chain.

parameters: {"skip_every":2}

Gated Attention

Gated skip/re-injection mechanism between sparse hop blocks.

parameters: {"init":0.1}

XSA

Cross-position attention layers interleaved with sparse connectome hops.

parameters: {"layers":3,"heads":4,"dim":128}

other

Sparse connectome-derived masked linear network based on the C. elegans wiring diagram with sensory/interneuron/motor bottleneck.

parameters: {"neurons":300,"sensory":88,"motor":123,"hops":6,"density":0.0565}

Quantization

int8

bits: 8

scope: model

Compression

zlib

level: null

Regularization

logit softcap

parameters: {"cap":30}

Other

other

JEPA latent prediction objective with SigREG collapse prevention used alongside cross-entropy.

parameters: {"jepa_weight":0.5,"sigreg_weight":0.0001}

First JEPA-based submission in Parameter Golf
First biologically inspired sparse architecture submission
Sparse I/O bottleneck language model using a C. elegans connectome-derived mask
Gated sensory re-injection skip connections to prevent gradient death in deep sparse hops
Interleaved sparse connectome hops with multi-head causal cross-position attention
JEPA + SigREG training objective adapted to text
Ablation study showing random sparse topology slightly outperforms biological topology for language