PR #1308
openNon-Record: Ouroboros — Crawler Architecture Research (1.1364 BPB)
by newjordanView on GitHub
val_bpb
1.1364
Architecture
Hybrid
Optimizer
AdamW
Artifact Size
15.0MB
Training Techniques
Quantization
GPTQ
bits: null
scope: shared weights / crawler
GPTQ
bits: 4
scope: QK
Architecture
weight tying
Shared-weight crawler architecture with stacked flat and crawler streams.
parameters: {"flat_layers":9,"crawler_layers":1,"crawler_loops":2}
RoPE
Uses RoPE across streams, including loop-specific scaling.
parameters: {"scales":[9,1,1]}
LeakyReLU
Leaky ReLU activations in MLPs.
parameters: {"mlp_slope":0.5,"crawler_mlp_slope":0.5}
BigramHash
Uses a bigram vocabulary for token interaction features.
parameters: {"vocab_size":2048}
XSA
Cross-stream attention/routing features are used in the crawler research program.
parameters: null
Compression
brotli
level: null
LR Schedule
warmdown
parameters: {"warmdown_iters":2000}
Optimizer
AdamW
weight_decay: null
momentum: null
other_params: {"fused":true}
Sequence Length
sequence_length
train_length: 2048
eval_length: null
Regularization
weight decay
parameters: null
Novel Contributions
- Loop-aware GPTQ for shared-weight crawler architectures
- Five stacked signals on a 9-flat platform
- Bidirectional cross-stream recurrence / cross-injection research direction
- Width-vs-recursion analysis showing crawler gains come mainly from width reallocation
- Position-agnostic cross-stream routing concept for the Helix follow-up