PR #248

open

Non-record: local RTX 4070 SP1024 8x512 KV4 500-step run

by riatzukizaView on GitHub

val_bpb

1.6231

Architecture

Transformer

Optimizer

—

Artifact Size

10246842 bytes

Training Techniques

Architecture

tied embeddings

Input and output embeddings are tied.

parameters: null

KV head count

Uses reduced key/value head count relative to attention heads.

parameters: {"num_heads":8,"num_kv_heads":4}

Transformer size

Compact 8-layer, 512-dimensional Transformer configuration.

parameters: {"layers":8,"model_dim":512}

Quantization

int8

bits: 8

scope: all

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

LR Schedule

warmup

parameters: {"warmup_steps":4}

Other

other

Local non-record submission run on a consumer GPU under the 16MB artifact cap, using a locally validated search path and full published validation split.

parameters: {"hardware":"1x RTX 4070 Laptop GPU","iterations":500}

Novel Contributions

Non-record local consumer-GPU submission under the 16MB artifact cap
8-layer, 512-dim Transformer with 8 attention heads and 4 KV heads
Tied input/output embeddings
Full published validation split evaluation
Local search-loop-derived configuration re-evaluated on the full validation set
Int8 + zlib roundtrip artifact packaging