← Back to Architecture

Turbo-Muon

Architecture
Used in
1 PRs
Best BPB
1.1091
Avg BPB
1.1091

Hyperparameters Across PRs

pr_numberparameters
1126{"newton_schulz_iterations":4}