PR #1165

open

Structured Embedding Init (BPB 1.2314)

by brandonpfView on GitHub
val_bpb
1.2314
Architecture
Transformer
Optimizer
Artifact Size

Training Techniques

Initialization
spectral init
Pre-computed PCA-based token embedding positions from corpus co-occurrence statistics, selectively overriding a subset of token embeddings while leaving the rest at default Xavier initialization.
Architecture
weight tying
Tied input and output embeddings.
parameters: null

Novel Contributions

  • Pre-computed PCA token positions from corpus co-occurrence data
  • Selective override of 665/1024 token embeddings with structured positions
  • Xavier-standardized overridden embeddings to preserve gradient flow
  • Early training convergence improvement that did not persist through full training