PR #1367

open

Non-Record: 10L Spatio-Temporal SNN (BPB: 1.3319)

val_bpb
1.3319
Architecture
Transformer
Optimizer
Muon
Artifact Size
15.79 MB

Training Techniques

Architecture
Spiking-MLP
Replaced the standard MLP with a spiking MLP using discrete leaky integrate-and-fire simulation.
parameters: {"layers":10,"dim":768,"heads":12,"kv_heads":4,"T_sim":2}
RBF
Parameter-free radial basis function temporal encoding converts token embeddings into spike trains.
parameters: null
GQA
Uses grouped query attention with 4 KV heads.
parameters: {"kv_heads":4}
ArcTan surrogate gradients
Uses ArcTan surrogate derivatives to enable backpropagation through binary spike functions.
parameters: null
homeostatic threshold adaptation
Dynamic threshold adaptation maintains neuron firing rates around 5%.
parameters: {"target_firing_rate":0.05}
Quantization
int8
bits: 8
scope: model
Optimizer
Muon
weight_decay: null
momentum: null
other_params: null
Compression
zlib
level: null

Novel Contributions

  • Spatio-temporal spiking neural network for autoregressive language modeling
  • Spiking-MLP replacing the standard Transformer MLP
  • RBF temporal encoding into spike trains
  • ArcTan surrogate gradients for BPTT through binary spikes
  • Homeostatic threshold adaptation to stabilize firing rates
  • Int8 post-training quantization and zlib compression to fit the artifact limit