PR #1198
openAdd non-record 16MB submission: Hybrid Sparse Diffusion 2H on 8xH100
by ymrohitView on GitHub
val_bpb
1.5992
Architecture
Hybrid
Optimizer
—
Artifact Size
13,340,791 bytes
Training Techniques
Architecture
weight tying
Tied input and output embeddings.
parameters: null
KV head count
Hybrid sparse diffusion model with grouped key/value heads.
parameters: {"layers":9,"model_dim":512,"num_heads":8,"num_kv_heads":4}
Hybrid
Hybrid sparse diffusion architecture.
parameters: {"diffusion_num_steps":8,"diffusion_block_min":24,"diffusion_block_max":128,"diffusion_min_mask_frac":0.1,"diffusion_max_mask_frac":0.6,"diffusion_block_start_min_frac":0.25,"diffusion_block_start_max_frac":0.9,"diffusion_time_scale":0.05,"diffusion_refine_last_n":5,"diffusion_batch_shared_block":1}
Quantization
int8
bits: 8
scope: model weights
Compression
zlib
level: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
LR Schedule
warmdown
parameters: {"warmdown_iters":12000}
Other
other
Unlimited-compute non-record submission run on 8xH100 cloud hardware.
parameters: {"gpus":8,"gpu_type":"H100","iterations":100000,"max_wallclock_seconds":7195}
Novel Contributions
- Non-record 16MB submission for the hybrid sparse diffusion line
- Real 8xH100 cloud run with the current v7 architecture
- Int8+zlib artifact packaging under the 16MB cap
- Successful raw full-validation evaluation with the run still improving at stop
- Local proxy roundtrip sanity check on the saved int8 artifact