PR #1951

open

Sophonics LoRA Int5 DeepMLP non-record submission

by DesperateTomatoCultivatorView on GitHub
val_bpb
1.3522
Architecture
Transformer
Optimizer
Artifact Size
10,842,720 bytes

Training Techniques

Architecture
weight tying
Tied embeddings in the base GPT model.
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Quantization
int5
bits: 5
scope: base model during repair
int8
bits: 8
scope: reference precision during repair and final artifact
Other
other
LoRA repair modules trained on the deepest two MLP blocks and then merged back into ordinary checkpoint weights.
parameters: {"rank":16,"alpha":16,"target_regex":"^blocks\\.[7-8]\\.mlp\\.(fc|proj)$","steps":600}
Compression
zlib
level: null

Novel Contributions

  • Unlimited-compute non-record submission under the 16MB artifact cap
  • Train a compact tied-embedding GPT base model
  • Use an int5 repair substrate with rank-16 LoRA modules on the deepest MLP blocks
  • Merge repair weights back into standard checkpoint weights before final quantization
  • Quantize the merged checkpoint to int8 and zlib-compress it
  • Demonstrate that localized learned repair can recover most of the int5-to-int8 performance gap