PR #1951

open

Sophonics LoRA Int5 DeepMLP non-record submission

by DesperateTomatoCultivatorView on GitHub

val_bpb

1.3522

Architecture

Transformer

Optimizer

—

Artifact Size

10,842,720 bytes

Training Techniques

Architecture

weight tying

Tied embeddings in the base GPT model.

parameters: null

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

Quantization

int5

bits: 5

scope: base model during repair

int8

bits: 8

scope: reference precision during repair and final artifact

Other

other

LoRA repair modules trained on the deepest two MLP blocks and then merged back into ordinary checkpoint weights.

parameters: {"rank":16,"alpha":16,"target_regex":"^blocks\\.[7-8]\\.mlp\\.(fc|proj)$","steps":600}

Compression

zlib

level: null

Unlimited-compute non-record submission under the 16MB artifact cap
Train a compact tied-embedding GPT base model
Use an int5 repair substrate with rank-16 LoRA modules on the deepest MLP blocks
Merge repair weights back into standard checkpoint weights before final quantization
Quantize the merged checkpoint to int8 and zlib-compress it
Demonstrate that localized learned repair can recover most of the int5-to-int8 performance gap