← Back to Architecture
per-layer scalars
ArchitectureUsed in
1 PRs
Best BPB
1.3323
Avg BPB
1.3323
Submissions
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 341 | {"scalars":["attn_scale","mlp_scale","resid_mix","q_gain"]} |