PR #1650
openNon-record: Fibonacci Manifold Network + Hybrid Attention + Sparse Braid
by JaredcastorenaView on GitHub
val_bpb
1.4233
Architecture
Hybrid
Optimizer
Muon
Artifact Size
14,785,394 bytes
Training Techniques
Architecture
FMN
Fibonacci Manifold Network with a shared matrix constrained by W0^2 = W0 + I and reused across all depths.
parameters: {"layers":12,"shared_matrix":true}
XSA
Top-layer causal self-attention added only in the upper layers of the hybrid model.
parameters: {"layers":3,"heads":4}
SparseBraidRegister
Sparse recurrent entity memory with K slots, gated carry, strand mixing, and directional render gating.
parameters: {"K":7,"slot_dim":128,"window":64}
weight tying
LM head shares weights with the token embedding matrix.
parameters: null
LeakyReLU
PsiNet uses LeakyReLU(0.5) inside the residual MLP.
parameters: {"negative_slope":0.5}
Optimizer
Muon
weight_decay: null
momentum: null
other_params: {"used_for":"all other non-braid matrix parameters and braid matrices","newton_schulz_steps":5}
Adam
weight_decay: null
momentum: null
other_params: {"used_for":"token embeddings and scalar parameters"}
FMNRiemannianAdam
weight_decay: null
momentum: null
other_params: {"used_for":"shared W0 manifold-constrained matrix"}
Quantization
int8
bits: 8
scope: model weights
Compression
zlib
level: null
Regularization
dropout
parameters: null
weight decay
parameters: null
LR Schedule
warmup
parameters: {"warmup_steps":20}
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Novel Contributions
- Fibonacci Manifold Network with a shared matrix constrained to W0^2 = W0 + I
- Exact phi/psi eigenspace projection via closed-form Richardson-style cancellation
- Hybrid design combining pure FMN layers with top-layer causal self-attention
- Sparse braid recurrent entity memory with gated slot updates and directional render gating
- Riemannian Adam-style optimization on the Fibonacci manifold
- Near-lossless int8 roundtrip with very small quantization penalty