PR #1636
openNon-record: TWEO early-cosine outlier regularization on SP1024 baseline
by PapaFranku4647View on GitHub
val_bpb
1.2299
Architecture
Transformer
Optimizer
—
Artifact Size
15,890,375 bytes
Training Techniques
Regularization
TWEO
parameters: {"tau":5,"p":4,"lambda_start":0.0002,"lambda_final":0,"decay_steps":3000,"schedule":"cosine"}
LR Schedule
cosine decay
parameters: {"decay_steps":3000}
Quantization
int8
bits: 8
scope: artifact roundtrip
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: null
Novel Contributions
- Applied a lightweight TWEO-style train-time activation outlier regularizer to the SP1024 baseline
- Found that a small early cosine-decayed TWEO pulse improved final int8+zlib BPB on matched 4h seed pairs
- Showed that fixed TWEO and nonzero-tail TWEO variants hurt BPB in this setup
- Reported directional confirmation on a 1×H100 80-minute matched seed pair
- Observed strong suppression of post-block activation outliers with only a small BPB gain