PR #814
openRecord: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820)
by newjordanView on GitHub
val_bpb
0.4820
Architecture
11L Transformer
Optimizer
—
Artifact Size
15.58 MB
Training Techniques
Architecture
XSA-4
Uses XSA-4 attention/architecture component in the transformer.
parameters: null
Weight Averaging
SWA
parameters: null
EMA
parameters: null
Quantization
GPTQ
bits: 6
scope: all
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
score-first TTT
parameters: {"chunk_based":true,"update_after_scoring":true}
Initialization
warm-start cubric initialization
Cubric multipliers are initialized from previously converged values instead of 1.0.
Other
other
3D Cubric pattern recognizer with adaptive multipliers over order x entropy_bin x count_bin.
parameters: {"multipliers":54}
other
Complementary training that downweights loss for tokens predictable by bigram statistics.
parameters: {"complement_alpha":0.5}
other
Shared n-gram tables updated across all 8 GPU ranks using chunk tokens.
parameters: {"ranks":8}
Novel Contributions
- 3D Cubric pattern recognizer with 54 adaptive multipliers across order, entropy bin, and count bin
- Warm-start initialization of cubric multipliers from prior converged values
- Complementary training that reduces loss weight for bigram-predictable tokens
- Shared n-gram tables across all GPU ranks for full-data statistics
- Score-first chunk protocol for legality of test-time adaptation