PR #495
openNon-record: Add submission track_non_record_16mb/2026-03-23_DepthRecurrent_TTT
by SergiuDeveloperView on GitHub
val_bpb
1.2092
Architecture
Transformer
Optimizer
—
Artifact Size
15,544,590 bytes
Training Techniques
Architecture
depth recurrence
Runs each encoder and decoder block multiple times before propagating activations forward, increasing representational depth without adding parameters
parameters: {"ENCODER_LOOPS":2,"DECODER_LOOPS":2}
tied embeddings
Input and output embeddings are tied
parameters: {"TIE_EMBEDDINGS":1}
Test-Time Training
LoRA TTT
parameters: {"rank":8,"chunk_size":256}
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Novel Contributions
- Use of depth recurrence to increase model representational depth without increasing parameter count
- Chunk-causal test-time LoRA adaptation allowing model specialization to each document's distribution during evaluation
- Operating under a strict 16MB artifact size cap with a 600s wallclock training time limit