PR #2021

open

Non-record: BigramHash uppercase enrichment (1xH100, val_bpb 1.3132)

by FelixFuhsView on GitHub
val_bpb
1.3132
Architecture
Transformer
Optimizer
Artifact Size
15,343,280 bytes

Training Techniques

Architecture
BigramHash
Adds a learned hashed local-memory table using previous and current token ids, injected into the transformer.
parameters: {"table_size":12288,"inject_layer":1,"mode":"additive"}
Other
other
Tokenizer-derived hash-address enrichment using an uppercase flag in the hash key.
parameters: {"hash_enrich":"upper"}
Quantization
int8
bits: 8
scope: model
Compression
zlib
level: null

Novel Contributions

  • BigramHash sweep on a 1xH100 setup
  • Uppercase hash enrichment improved BigramHash performance
  • Demonstrated that simple tokenizer-derived hash enrichment outperformed more complex variants
  • Documented a non-record single-seed run with full reproduction artifacts