PR #2004

open

MirrorLoop HRC + LexLoRE non-record submission

by corbensorensonView on GitHub

val_bpb

1.3569

Architecture

Transformer

Optimizer

Muon

Artifact Size

15,658,145 bytes

Training Techniques

Architecture

depth recurrence

Mirrored hourglass recurrent circuit with reused middle blocks and mirrored entry/exit tails (012 | 34567 | 34567 | 210).

parameters: {"unique_blocks":8,"effective_depth":16,"route_repeats":2}

weight tying

Factored tied embeddings / tied token interface to save artifact bytes while preserving lexical capacity.

parameters: {"model_dim":704,"factored_embed_dim":832}

attention modification

Only the first recurrent-core block keeps attention enabled; the remaining repeated middle blocks are MLP-only.

parameters: {"core_attention_block":3,"mlp_only_blocks":[4,5,6,7]}

other

LexLoRE / VocabMoE token-conditioned low-rank residual lexical adapters placed at the input and first loop entry.

parameters: {"layers":["input","loop_first"],"experts":16,"rank":2}

Quantization

int8 QAT

bits: 8

scope: all

Compression

lzma

level: null

Optimizer

Muon

weight_decay: 0

momentum: null

other_params: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

LR Schedule

warmdown

parameters: {"warmdown_steps":2200}

Regularization

weight decay

parameters: {"weight_decay":0}

Novel Contributions

MirrorLoop HRC hourglass recurrent circuit with mirrored input/output tails
LexLoRE token-conditioned low-rank lexical adapters implemented via VocabMoE flags
Train-time quantized forward from step 0, including embeddings
Factored tied embeddings to fit within the 16MB artifact budget
Single attention-capable core-entry block with MLP-only recurrent blocks
LQER low-rank export repair for quantization error
Explicit artifact-size-aware design targeting the decimal 16,000,000 byte cap