PR #2120

open

non-record - mockingbird - sota copy/10kvocab

val_bpb

1.0624

Architecture

Transformer

Optimizer

—

Artifact Size

15.82 MB

Training Techniques

Quantization

mixed int6/int7

bits: 6

scope: weights and embeddings

Architecture

weight tying

Not explicitly stated; no weight tying mentioned.

parameters: null

depth recurrence

Looping architecture with recurrent block execution.

parameters: {"loop_start":3,"loop_end":5,"enable_looping_at":0.45}

Test-Time Training

LoRA TTT

parameters: {"prefix_docs":2500,"phases":3,"chunk":48}

Compression

pergroup

level: null

Sequence Length

sequence_length

train_length: null

eval_length: null