PR #495

open

Non-record: Add submission track_non_record_16mb/2026-03-23_DepthRecurrent_TTT

by SergiuDeveloperView on GitHub

val_bpb

1.2092

Architecture

Transformer

Optimizer

—

Artifact Size

15,544,590 bytes

Training Techniques

Architecture

depth recurrence

Runs each encoder and decoder block multiple times before propagating activations forward, increasing representational depth without adding parameters

parameters: {"ENCODER_LOOPS":2,"DECODER_LOOPS":2}

tied embeddings

Input and output embeddings are tied

parameters: {"TIE_EMBEDDINGS":1}

Test-Time Training

LoRA TTT

parameters: {"rank":8,"chunk_size":256}

Sequence Length

sequence_length

train_length: 1024

eval_length: null

Novel Contributions

Use of depth recurrence to increase model representational depth without increasing parameter count
Chunk-causal test-time LoRA adaptation allowing model specialization to each document's distribution during evaluation
Operating under a strict 16MB artifact size cap with a 600s wallclock training time limit