PR #800

open

Record: X-WING — Shared N-gram Tables + Cubric (val_bpb=0.5644)

by newjordanView on GitHub
val_bpb
0.5644
Architecture
Optimizer
Artifact Size
15.63 MB

Training Techniques

Other
other
Chunk-based shared n-gram tables where all 8 GPU ranks update the same tables using the same tokens, giving each rank the full token history instead of rank-local subsets.
parameters: {"ranks":8,"token_history_scale":"full 62M-token picture"}
other
Cubric per-order adaptive alpha that suppresses noisy n-gram orders and boosts reliable higher orders based on model entropy.
parameters: {"suppressed_orders":[2,3],"suppression_scale":[0.3,0.45],"boosted_orders":[5,6,7],"boost_scale":[1.88,2]}

Novel Contributions

  • Shared chunk-based n-gram tables across all GPU ranks
  • Cubric per-order adaptive alpha scaling
  • Score-first chunk evaluation before table updates
  • Full 62M-token shared context across ranks instead of rank-local tables