PR #1198

open

Add non-record 16MB submission: Hybrid Sparse Diffusion 2H on 8xH100

by ymrohitView on GitHub

val_bpb

1.5992

Architecture

Hybrid

Optimizer

—

Artifact Size

13,340,791 bytes

Training Techniques

Architecture

weight tying

Tied input and output embeddings.

parameters: null

KV head count

Hybrid sparse diffusion model with grouped key/value heads.

parameters: {"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4}

Hybrid

Hybrid sparse diffusion architecture.

parameters: {"diffusion_num_steps":8,"diffusion_block_min":24,"diffusion_block_max":128,"diffusion_min_mask_frac":0.1,"diffusion_max_mask_frac":0.6,"diffusion_block_start_min_frac":0.25,"diffusion_block_start_max_frac":0.9,"diffusion_time_scale":0.05,"diffusion_refine_last_n":5,"diffusion_batch_shared_block":1}

Quantization

int8

bits: 8

scope: model weights

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

LR Schedule

warmdown

parameters: {"warmdown_iters":12000}

Other

other

Unlimited-compute non-record submission run on 8xH100 cloud hardware.

parameters: {"gpus":8,"gpu_type":"H100","iterations":100000,"max_wallclock_seconds":7195}

Novel Contributions

Non-record 16MB submission for the hybrid sparse diffusion line
Real 8xH100 cloud run with the current v7 architecture
Int8+zlib artifact packaging under the 16MB cap
Successful raw full-validation evaluation with the run still improving at stop
Local proxy roundtrip sanity check on the saved int8 artifact