PR #248

open

Non-record: local RTX 4070 SP1024 8x512 KV4 500-step run

by riatzukizaView on GitHub
val_bpb
1.6231
Architecture
Transformer
Optimizer
Artifact Size
10246842 bytes

Training Techniques

Architecture
tied embeddings
Input and output embeddings are tied.
parameters: null
KV head count
Uses reduced key/value head count relative to attention heads.
parameters: {"num_heads":8,"num_kv_heads":4}
Transformer size
Compact 8-layer, 512-dimensional Transformer configuration.
parameters: {"layers":8,"model_dim":512}
Quantization
int8
bits: 8
scope: all
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
LR Schedule
warmup
parameters: {"warmup_steps":4}
Other
other
Local non-record submission run on a consumer GPU under the 16MB artifact cap, using a locally validated search path and full published validation split.
parameters: {"hardware":"1x RTX 4070 Laptop GPU","iterations":500}

Novel Contributions

  • Non-record local consumer-GPU submission under the 16MB artifact cap
  • 8-layer, 512-dim Transformer with 8 attention heads and 4 KV heads
  • Tied input/output embeddings
  • Full published validation split evaluation
  • Local search-loop-derived configuration re-evaluated on the full validation set
  • Int8 + zlib roundtrip artifact packaging