PR #646

open

SOTA Submission (1.1349 BPB) by weywey [10min_16mb track]

by UpsallaView on GitHub
val_bpb
1.1349
Architecture
Transformer
Optimizer
Artifact Size
16590005 bytes

Training Techniques

Quantization
int6
bits: 6
scope: null
Architecture
OLR-FW
Architectured via Weyhe Framework
parameters: {"layers":11,"R":128}
LR Schedule
beta2 decay
parameters: {"beta2":0.95,"learning_rate":0.001}
Test-Time Training
TTT
parameters: null

Novel Contributions

  • Use of 11-layer INT6 quantized model
  • Architecture designed with Weyhe Framework (OLR-FW)
  • Test-time training (TTT) applied
  • Learning rate 0.001 with beta2=0.95