PR #2115

open

Non-record: Hopper regenerated adapters + B2B kernel prototype — documented negative result

val_bpb
1.6853
Architecture
Transformer
Optimizer
Muon
Artifact Size
8,099,240 bytes

Training Techniques

Quantization
int8
bits: 8
scope: artifact/model export
Architecture
LoRA
Regenerated ternary back-to-back adapter added to transformer projections as Y = X@W^T + alpha*(X@A^T)@B^T, with A and B regenerated from a 64-bit seed and stored implicitly via seed/alpha only.
parameters: {"rank":16,"roles":"attn_c_v,attn_proj,mlp_fc,mlp_proj"}
weight tying
Non-persistent regenerated adapter buffers are excluded from state_dict and recreated from seed at init time.
parameters: null
Compression
zlib
level: null
Optimizer
Muon
weight_decay: null
momentum: null
other_params: null
Sequence Length
sequence_length
train_length: 1024
eval_length: 1024
Other
other
Rule-30 cellular automaton plus Achlioptas ternary regeneration for adapter matrices, with bit-for-bit Python/C++/CUDA equivalence.
parameters: {"seed_bits":64,"distribution":"{-1,0,+1}"}
other
H100 CUTLASS WGMMA/WMMA bridge kernel prototype with multiple backend paths and parity-checked micro-GEMM routing.
parameters: {"backends":["wmma_bridge","isolated_base_stub","cutlass_base_adapter_epilogue","cutlass_fused_v1","cutlass_fused_v2"]}

Novel Contributions

  • Zero-byte-artifact regenerated adapter primitive using only (seed, alpha) storage
  • Rule-30 + Achlioptas ternary regeneration for LoRA-style adapters
  • Back-to-back micro-GEMM adapter formulation on Hopper
  • CUTLASS WGMMA/WMMA bridge kernel prototype for H100
  • Documented negative result showing the adapter hurts validation quality at matched updates
  • Bit-for-bit Python/C++/CUDA regeneration equivalence and H100 kernel parity proofs