PR #2091
openAdd ppm mix nonrecord submission - SP8192 + Value Residual + Byte-Level PPM Mixture val_bpb=1.16 ppm_mix_bpb=0.83
by lkk688View on GitHub
val_bpb
1.1586
Architecture
Transformer
Optimizer
—
Artifact Size
15973626 bytes
Training Techniques
Architecture
Value Residual
Adds value residual connections to the transformer.
parameters: null
XSA
Uses XSA on the last layers as part of the structured skip fusion setup.
parameters: {"layers":4}
weight tying
Shares V across the last 3 layers.
parameters: {"layers":3}
U-Net skip connections
Uses structured skip fusion / BIFPN2-style skip connections.
parameters: null
BigramHash
Uses a 2-gram scaffold with fade-out.
parameters: {"n":2}
Sequence Length
sequence_length
train_length: 8192
eval_length: null
Other
other
Tokenizer scaling / capacity scaling across iterative ablations.
parameters: null
other
Byte-level PPM mixture used to obtain the mixed score.
parameters: null
Novel Contributions
- Incremental research process with environment-controlled switches for ablations
- Tokenizer scaling
- Capacity scaling
- Value Residual
- Byte-level PPM mixture
- Structured skip fusion / BIFPN2 mode
- Shared V across the last 3 layers
- XSA on the last 4 layers
- 2-gram scaffold with fade-out