PR #2021
openNon-record: BigramHash uppercase enrichment (1xH100, val_bpb 1.3132)
by FelixFuhsView on GitHub
val_bpb
1.3132
Architecture
Transformer
Optimizer
—
Artifact Size
15,343,280 bytes
Training Techniques
Architecture
BigramHash
Adds a learned hashed local-memory table using previous and current token ids, injected into the transformer.
parameters: {"table_size":12288,"inject_layer":1,"mode":"additive"}
Other
other
Tokenizer-derived hash-address enrichment using an uppercase flag in the hash key.
parameters: {"hash_enrich":"upper"}
Quantization
int8
bits: 8
scope: model
Compression
zlib
level: null
Novel Contributions
- BigramHash sweep on a 1xH100 setup
- Uppercase hash enrichment improved BigramHash performance
- Demonstrated that simple tokenizer-derived hash enrichment outperformed more complex variants
- Documented a non-record single-seed run with full reproduction artifacts