PR #2113
openNon-record: CustomToken2050 Compact Core 416d, 1.3967 BPB two-day proof of direction
by Carter2565View on GitHub
val_bpb
1.3967
Architecture
Transformer
Optimizer
—
Artifact Size
~13.8 MB
Training Techniques
Architecture
MLP3
Compact dense model variant with an MLP3 configuration and custom token modeling path.
parameters: {"layers":4,"dimensions":416,"kv":1,"vocab_size":2050}
Quantization
int8
bits: 8
scope: packed model
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: null
eval_length: null
Novel Contributions
- Non-record public artifact documenting an early proof of direction
- Custom-token modeling path with 2050 custom token IDs
- Compact dense model under the 16MB artifact constraint
- Pre-tokenized token IDs with proprietary tokenizer and loss helpers excluded
- Int8 plus zlib packing path with estimated ~13.8 MB artifact size
- Two-day sprint result on 1x H100 SXM after local iteration on RX 7900 XTX