PR #247
openNon-record: local RTX 4070 SP1024 8x512 KV4 seq768 500-step run
by riatzukizaView on GitHub
val_bpb
1.6114
Architecture
Transformer
Optimizer
—
Artifact Size
10036271 bytes
Training Techniques
Architecture
tied embeddings
Input and output embeddings are tied.
parameters: null
KV head count
Uses fewer key/value heads than attention heads.
parameters: {"layers":8,"model_dim":512,"num_heads":8,"num_kv_heads":4}
Sequence Length
sequence_length
train_length: 768
eval_length: null
Compression
zlib
level: null
Other
other
Post-training int8 zlib roundtrip evaluation of the serialized model artifact.
parameters: {"serialized_model_bytes":9988629,"total_submission_bytes":10036271}
Novel Contributions
- Non-record local consumer-GPU submission under the 16MB artifact cap
- Throughput-oriented search path for an 8-layer 512-dim configuration
- Full published validation split evaluation using fineweb_val_*
- Compact local RTX 4070 Laptop GPU run with tied embeddings and reduced KV heads
- Public non-record anchor for a candidate family selected through repeated local search and validation