PR #2115

open

Non-record: Hopper regenerated adapters + B2B kernel prototype — documented negative result

by SacmajView on GitHub

val_bpb

1.6853

Architecture

Transformer

Optimizer

Muon

Artifact Size

8,099,240 bytes

Training Techniques

Quantization

int8

bits: 8

scope: artifact/model export

Architecture

LoRA

Regenerated ternary back-to-back adapter added to transformer projections as Y = X@W^T + alpha*(X@A^T)@B^T, with A and B regenerated from a 64-bit seed and stored implicitly via seed/alpha only.

parameters: {"rank":16,"roles":"attn_c_v,attn_proj,mlp_fc,mlp_proj"}

weight tying

Non-persistent regenerated adapter buffers are excluded from state_dict and recreated from seed at init time.

parameters: null

Compression

zlib

level: null

Optimizer

Muon

weight_decay: null

momentum: null

other_params: null

Sequence Length

sequence_length

train_length: 1024

eval_length: 1024

Other

other

Rule-30 cellular automaton plus Achlioptas ternary regeneration for adapter matrices, with bit-for-bit Python/C++/CUDA equivalence.

parameters: {"seed_bits":64,"distribution":"{-1,0,+1}"}

other

H100 CUTLASS WGMMA/WMMA bridge kernel prototype with multiple backend paths and parity-checked micro-GEMM routing.

parameters: {"backends":["wmma_bridge","isolated_base_stub","cutlass_base_adapter_epilogue","cutlass_fused_v1","cutlass_fused_v2"]}

Novel Contributions

Zero-byte-artifact regenerated adapter primitive using only (seed, alpha) storage
Rule-30 + Achlioptas ternary regeneration for LoRA-style adapters
Back-to-back micro-GEMM adapter formulation on Hopper
CUTLASS WGMMA/WMMA bridge kernel prototype for H100
Documented negative result showing the adapter hurts validation quality at matched updates
Bit-for-bit Python/C++/CUDA regeneration equivalence and H100 kernel parity proofs