PR #1152
openNon-record: Connectome-JEPA — Sparse I/O Bottleneck (val_bpb 1.7942 ± 0.003, 1.97MB artifact, first JEPA submission)
by ericdatumView on GitHub
val_bpb
1.7942
Architecture
Transformer
Optimizer
—
Artifact Size
1.97MB
Training Techniques
Architecture
weight tying
Input embedding and output projection share weights.
parameters: null
RoPE
Uses rotary positional embeddings in causal cross-position attention.
parameters: null
U-Net skip connections
Gated sensory re-injection every 2 sparse hops to preserve gradient flow through the sparse chain.
parameters: {"skip_every":2}
Gated Attention
Gated skip/re-injection mechanism between sparse hop blocks.
parameters: {"init":0.1}
XSA
Cross-position attention layers interleaved with sparse connectome hops.
parameters: {"layers":3,"heads":4,"dim":128}
other
Sparse connectome-derived masked linear network based on the C. elegans wiring diagram with sensory/interneuron/motor bottleneck.
parameters: {"neurons":300,"sensory":88,"motor":123,"hops":6,"density":0.0565}
Quantization
int8
bits: 8
scope: model
Compression
zlib
level: null
Regularization
logit softcap
parameters: {"cap":30}
Other
other
JEPA latent prediction objective with SigREG collapse prevention used alongside cross-entropy.
parameters: {"jepa_weight":0.5,"sigreg_weight":0.0001}
Novel Contributions
- First JEPA-based submission in Parameter Golf
- First biologically inspired sparse architecture submission
- Sparse I/O bottleneck language model using a C. elegans connectome-derived mask
- Gated sensory re-injection skip connections to prevent gradient death in deep sparse hops
- Interleaved sparse connectome hops with multi-head causal cross-position attention
- JEPA + SigREG training objective adapted to text
- Ablation study showing random sparse topology slightly outperforms biological topology for language