val_bpb
1.2792
Architecture
Transformer
Optimizer
—
Artifact Size
15973626 bytes
Training Techniques
Architecture
XSA
XSA applied on the last 4 layers
parameters: {"layers":4}
U-Net skip connections
Structured skip fusion / BIFPN2 mode enabled
parameters: {"BIFPN2_MODE":1}
weight tying
Shared V across the last 3 layers
parameters: {"layers":3}
BigramHash
2-gram scaffold with fade-out
parameters: {"max_n":2,"fade_enable":1}
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Stable non-record 16MB submission under the artifact limit
- Official SP1024 tokenizer and FineWeb SP1024 dataset
- Structured skip fusion with BIFPN2 mode
- XSA on the last 4 layers
- 2-gram scaffold with fade-out
- Shared V across the last 3 layers
- 3-seed submission with representative seed 2027