← Back to Architecture
Value Residual Learning (VRL)
ArchitectureUsed in
2 PRs
Best BPB
1.1175
Avg BPB
1.1204
Hyperparameters Across PRs
| pr_number | parameters |
|---|---|
| 569 | {"learned_alphas":10,"sigmoid_init":0} |
| 657 | {"layers":11,"initial_gate_bias":-1.5,"initial_mixing":"approx 18%"} |