← Back to Architecture

Value Residual Learning (VRL)

Architecture
Used in
2 PRs
Best BPB
1.1175
Avg BPB
1.1204

Hyperparameters Across PRs

pr_numberparameters
569{"learned_alphas":10,"sigmoid_init":0}
657{"layers":11,"initial_gate_bias":-1.5,"initial_mixing":"approx 18%"}