PR #643

open

Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5672)

by frido22View on GitHub
val_bpb
1.5672
Architecture
Optimizer
Artifact Size
15,962,372 bytes

Training Techniques

Quantization
int8
bits: 8
scope: all
Weight Averaging
EMA
parameters: {"scope":"projection matrices","timing":"late EMA after first quant-aware roundtrip"}
Compression
zlib
level: null

Novel Contributions

  • Use of late EMA over only the projection matrices that remain int8-quantized after the first quant-aware roundtrip
  • Reapplication of the exact final quantization roundtrip before saving to improve final compressed artifact score
  • Submission as a hardware-specific non-record entry on Apple Silicon Mac mini M4 16GB without 8xH100 GPUs