PR #1951
openSophonics LoRA Int5 DeepMLP non-record submission
by DesperateTomatoCultivatorView on GitHub
val_bpb
1.3522
Architecture
Transformer
Optimizer
—
Artifact Size
10,842,720 bytes
Training Techniques
Architecture
weight tying
Tied embeddings in the base GPT model.
parameters: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Quantization
int5
bits: 5
scope: base model during repair
int8
bits: 8
scope: reference precision during repair and final artifact
Other
other
LoRA repair modules trained on the deepest two MLP blocks and then merged back into ordinary checkpoint weights.
parameters: {"rank":16,"alpha":16,"target_regex":"^blocks\\.[7-8]\\.mlp\\.(fc|proj)$","steps":600}
Compression
zlib
level: null
Novel Contributions
- Unlimited-compute non-record submission under the 16MB artifact cap
- Train a compact tied-embedding GPT base model
- Use an int5 repair substrate with rank-16 LoRA modules on the deepest MLP blocks
- Merge repair weights back into standard checkpoint weights before final quantization
- Quantize the merged checkpoint to int8 and zlib-compress it
- Demonstrate that localized learned repair can recover most of the int5-to-int8 performance gap