← Back to Architecture

per-layer scalars

Architecture
Used in
1 PRs
Best BPB
1.3323
Avg BPB
1.3323

Hyperparameters Across PRs

pr_numberparameters
341{"scalars":["attn_scale","mlp_scale","resid_mix","q_gain"]}