PR #247

open

Non-record: local RTX 4070 SP1024 8x512 KV4 seq768 500-step run

by riatzukizaView on GitHub
val_bpb
1.6114
Architecture
Transformer
Optimizer
Artifact Size
10036271 bytes

Training Techniques

Architecture
tied embeddings
Input and output embeddings are tied.
parameters: null
KV head count
Uses fewer key/value heads than attention heads.
parameters: {"layers":8,"model_dim":512,"num_heads":8,"num_kv_heads":4}
Sequence Length
sequence_length
train_length: 768
eval_length: null
Compression
zlib
level: null
Other
other
Post-training int8 zlib roundtrip evaluation of the serialized model artifact.
parameters: {"serialized_model_bytes":9988629,"total_submission_bytes":10036271}

Novel Contributions

  • Non-record local consumer-GPU submission under the 16MB artifact cap
  • Throughput-oriented search path for an 8-layer 512-dim configuration
  • Full published validation split evaluation using fineweb_val_*
  • Compact local RTX 4070 Laptop GPU run with tied embeddings and reduced KV heads
  • Public non-record anchor for a candidate family selected through repeated local search and validation