val_bpb
1.1349
Architecture
Transformer
Optimizer
—
Artifact Size
16590005 bytes
Training Techniques
Quantization
int6
bits: 6
scope: null
Architecture
OLR-FW
Architectured via Weyhe Framework
parameters: {"layers":11,"R":128}
LR Schedule
beta2 decay
parameters: {"beta2":0.95,"learning_rate":0.001}
Test-Time Training
TTT
parameters: null
Novel Contributions
- Use of 11-layer INT6 quantized model
- Architecture designed with Weyhe Framework (OLR-FW)
- Test-time training (TTT) applied
- Learning rate 0.001 with beta2=0.95