PR #2113

open

Non-record: CustomToken2050 Compact Core 416d, 1.3967 BPB two-day proof of direction

by Carter2565View on GitHub
val_bpb
1.3967
Architecture
Transformer
Optimizer
Artifact Size
~13.8 MB

Training Techniques

Architecture
MLP3
Compact dense model variant with an MLP3 configuration and custom token modeling path.
parameters: {"layers":4,"dimensions":416,"kv":1,"vocab_size":2050}
Quantization
int8
bits: 8
scope: packed model
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: null
eval_length: null

Novel Contributions

  • Non-record public artifact documenting an early proof of direction
  • Custom-token modeling path with 2050 custom token IDs
  • Compact dense model under the 16MB artifact constraint
  • Pre-tokenized token IDs with proprietary tokenizer and loss helpers excluded
  • Int8 plus zlib packing path with estimated ~13.8 MB artifact size
  • Two-day sprint result on 1x H100 SXM after local iteration on RX 7900 XTX