PR #571

open

Non-record: trigram phrase-memory ablation on 1×H100: negative result (1.2791 BPB best)

by maxwellcipherView on GitHub

val_bpb

1.2791

Architecture

Transformer

Optimizer

—

Artifact Size

21.6MB

Training Techniques

Quantization

int8 QAT

bits: 8

scope: null

Architecture

BigramHash

Static bigram lookup table with 8192 buckets and 128 embedding dimension

parameters: {"buckets":8192,"embed_dim":128}

TrigramHash

Static trigram lookup table tested as ablation with varying bucket sizes and embedding dimensions

parameters: {"variants":[{"buckets":2048,"embed_dim":64},{"buckets":4096,"embed_dim":96}]}

Weight Averaging

EMA

parameters: null

Evaluation

sliding window eval

parameters: null

Controlled ablation study showing trigram phrase-memory lookup tables do not improve performance at 16MB scale on 1×H100.
Demonstrated that byte budget is better spent on backbone capacity than static trigram lookup tables at this scale.
Published actual controlled comparison numbers confirming prior informal notes about trigram ablation negative results at small scale.
Suggested that negative result might reverse with more training steps or on larger hardware (8×H100).