PR #1535

open

Recursive Transformer - Non-Record Submission — 1.07424983 val_bpb (4h depth-recurrent hybrid transformer run)

by newjordanView on GitHub
val_bpb
1.0742
Architecture
Hybrid
Optimizer
Artifact Size
14,184,849 bytes

Training Techniques

Architecture
depth recurrence
Hybrid flat-plus-crawler transformer with a shared depth-recurrent bottleneck and recursive crawler blocks.
parameters: {"flat_layers":7,"crawler_layers":3,"loops":3}
TAP
Shared tap projection used in crawler/flat interaction experiments.
parameters: {"dim":32}
ANCHOR
Anchor state used to stabilize crawler interactions.
parameters: {"dim":32}
Quantization
GPTQ
bits: 6
scope: all
Regularization
magnitude pruning
parameters: null
Compression
pyminify
level: null
Evaluation
sliding window eval
parameters: null
Weight Averaging
EMA
parameters: {"start_step":null}
LR Schedule
warmdown
parameters: null
Test-Time Training
score-first TTT
parameters: null

Novel Contributions

  • Hybrid flat-plus-crawler transformer architecture
  • Shared depth-recurrent crawler bottleneck
  • Loop-aware GPTQ for recursive/shared-weight models
  • Three-loop crawler configuration as the stable legal runner
  • Multi-crawler scaling law showing 3 crawler layers as the practical ceiling under the size cap
  • Long-run 4-hour confirmation of the final 7F+3C architecture