PR #252

open

Add PR114 RunPod H100 SXM non-record submission

by greqoneView on GitHub
val_bpb
1.1554
Architecture
Transformer
Optimizer
Muon
Artifact Size
15,963,195 bytes

Training Techniques

Architecture
tied embeddings
Uses tied embeddings in a 9-layer, width-512 GPT-style model with GQA and SP-1024.
parameters: {"layers":9,"width":512,"sp":1024}
GQA
Grouped-query attention used in the model architecture.
parameters: null
Quantization
mixed selective precision
bits: null
scope: model export with fp16 tied embedding and late-K passthrough
Optimizer
Muon
weight_decay: null
momentum: 0.99
other_params: {"muon_momentum_warmup_start":0.92,"muon_momentum_warmup_steps":1500,"matrix_lr":0.02,"scalar_lr":0.02,"tied_embed_lr":0.03,"grad_clip_norm":0.3}
LR Schedule
warmdown
parameters: {"warmdown_iters":3000}
Sequence Length
sequence_length
train_length: 2048
eval_length: 2048
Evaluation
sliding window eval
parameters: {"stride":256,"context_length":2048}
Compression
zlib
level: null

Novel Contributions

  • Non-record submission packaged under track_non_record_16mb after the leaderboard moved past the result.
  • Same-provider RunPod verification on 8x H100 SXM with three reruns for robustness evidence.
  • Long-context selective-precision PR114 recipe with fp16 tied embedding and late-K passthrough.
  • Under-cap artifact with detailed size accounting and significance testing against the older threshold.