PR #814

open

Record: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820)

by newjordanView on GitHub
val_bpb
0.4820
Architecture
11L Transformer
Optimizer
Artifact Size
15.58 MB

Training Techniques

Architecture
XSA-4
Uses XSA-4 attention/architecture component in the transformer.
parameters: null
Weight Averaging
SWA
parameters: null
EMA
parameters: null
Quantization
GPTQ
bits: 6
scope: all
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
score-first TTT
parameters: {"chunk_based":true,"update_after_scoring":true}
Initialization
warm-start cubric initialization
Cubric multipliers are initialized from previously converged values instead of 1.0.
Other
other
3D Cubric pattern recognizer with adaptive multipliers over order x entropy_bin x count_bin.
parameters: {"multipliers":54}
other
Complementary training that downweights loss for tokens predictable by bigram statistics.
parameters: {"complement_alpha":0.5}
other
Shared n-gram tables updated across all 8 GPU ranks using chunk tokens.
parameters: {"ranks":8}

Novel Contributions

  • 3D Cubric pattern recognizer with 54 adaptive multipliers across order, entropy bin, and count bin
  • Warm-start initialization of cubric multipliers from prior converged values
  • Complementary training that reduces loss weight for bigram-predictable tokens
  • Shared n-gram tables across all GPU ranks for full-data statistics
  • Score-first chunk protocol for legality of test-time adaptation