PR #1535
openRecursive Transformer - Non-Record Submission — 1.07424983 val_bpb (4h depth-recurrent hybrid transformer run)
by newjordanView on GitHub
val_bpb
1.0742
Architecture
Hybrid
Optimizer
—
Artifact Size
14,184,849 bytes
Training Techniques
Architecture
depth recurrence
Hybrid flat-plus-crawler transformer with a shared depth-recurrent bottleneck and recursive crawler blocks.
parameters: {"flat_layers":7,"crawler_layers":3,"loops":3}
TAP
Shared tap projection used in crawler/flat interaction experiments.
parameters: {"dim":32}
ANCHOR
Anchor state used to stabilize crawler interactions.
parameters: {"dim":32}
Quantization
GPTQ
bits: 6
scope: all
Regularization
magnitude pruning
parameters: null
Compression
pyminify
level: null
Evaluation
sliding window eval
parameters: null
Weight Averaging
EMA
parameters: {"start_step":null}
LR Schedule
warmdown
parameters: null
Test-Time Training
score-first TTT
parameters: null
Novel Contributions
- Hybrid flat-plus-crawler transformer architecture
- Shared depth-recurrent crawler bottleneck
- Loop-aware GPTQ for recursive/shared-weight models
- Three-loop crawler configuration as the stable legal runner
- Multi-crawler scaling law showing 3 crawler layers as the practical ceiling under the size cap
- Long-run 4-hour confirmation of the final 7F+3C architecture