PR #571
openNon-record: trigram phrase-memory ablation on 1×H100: negative result (1.2791 BPB best)
by maxwellcipherView on GitHub
val_bpb
1.2791
Architecture
Transformer
Optimizer
—
Artifact Size
21.6MB
Training Techniques
Quantization
int8 QAT
bits: 8
scope: null
Architecture
BigramHash
Static bigram lookup table with 8192 buckets and 128 embedding dimension
parameters: {"buckets":8192,"embed_dim":128}
TrigramHash
Static trigram lookup table tested as ablation with varying bucket sizes and embedding dimensions
parameters: {"variants":[{"buckets":2048,"embed_dim":64},{"buckets":4096,"embed_dim":96}]}
Weight Averaging
EMA
parameters: null
Evaluation
sliding window eval
parameters: null
Novel Contributions
- Controlled ablation study showing trigram phrase-memory lookup tables do not improve performance at 16MB scale on 1×H100.
- Demonstrated that byte budget is better spent on backbone capacity than static trigram lookup tables at this scale.
- Published actual controlled comparison numbers confirming prior informal notes about trigram ablation negative results at small scale.
- Suggested that negative result might reverse with more training steps or on larger hardware (8×H100).