← Back to Architecture

Partial RoPE + NTK-aware scaling

Architecture
Used in
1 PRs
Best BPB
1.1175
Avg BPB
1.1175

Hyperparameters Across PRs

pr_numberparameters
569{"partial_dims":[16,64],"ntk_base":10000}