PR #495

open

Non-record: Add submission track_non_record_16mb/2026-03-23_DepthRecurrent_TTT

by SergiuDeveloperView on GitHub
val_bpb
1.2092
Architecture
Transformer
Optimizer
Artifact Size
15,544,590 bytes

Training Techniques

Architecture
depth recurrence
Runs each encoder and decoder block multiple times before propagating activations forward, increasing representational depth without adding parameters
parameters: {"ENCODER_LOOPS":2,"DECODER_LOOPS":2}
tied embeddings
Input and output embeddings are tied
parameters: {"TIE_EMBEDDINGS":1}
Test-Time Training
LoRA TTT
parameters: {"rank":8,"chunk_size":256}
Sequence Length
sequence_length
train_length: 1024
eval_length: null

Novel Contributions

  • Use of depth recurrence to increase model representational depth without increasing parameter count
  • Chunk-causal test-time LoRA adaptation allowing model specialization to each document's distribution during evaluation
  • Operating under a strict 16MB artifact size cap with a 600s wallclock training time limit