PR #252

open

Add PR114 RunPod H100 SXM non-record submission

by greqoneView on GitHub

val_bpb

1.1554

Architecture

Transformer

Optimizer

Muon

Artifact Size

15,963,195 bytes

Training Techniques

Architecture

tied embeddings

Uses tied embeddings in a 9-layer, width-512 GPT-style model with GQA and SP-1024.

parameters: {"layers":9,"width":512,"sp":1024}

GQA

Grouped-query attention used in the model architecture.

parameters: null

Quantization

mixed selective precision

bits: null

scope: model export with fp16 tied embedding and late-K passthrough

Optimizer

Muon

weight_decay: null

momentum: 0.99

other_params: {"muon_momentum_warmup_start":0.92,"muon_momentum_warmup_steps":1500,"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"grad_clip_norm":0.3}

LR Schedule

warmdown

parameters: {"warmdown_iters":3000}

Sequence Length

sequence_length

train_length: 2048

eval_length: 2048

Evaluation

sliding window eval

parameters: {"stride":256,"context_length":2048}

Compression

zlib

level: null

Novel Contributions

Non-record submission packaged under track_non_record_16mb after the leaderboard moved past the result.
Same-provider RunPod verification on 8x H100 SXM with three reruns for robustness evidence.
Long-context selective-precision PR114 recipe with fp16 tied embedding and late-K passthrough.
Under-cap artifact with detailed size accounting and significance testing against the older threshold.