PR #1028
openMedusa: Unstable — DeltaNet Crawler 0.8104 BPB 10mb file size(best seed), mean 0.9984, Frugendorff continuation
by newjordanView on GitHub
val_bpb
0.8104
Architecture
Transformer
Optimizer
—
Artifact Size
10MB
Training Techniques
Architecture
DeltaNet
Uses canonical chunk_delta_rule DeltaNet heads inside a Frugendorff crawler topology.
parameters: {"heads":4,"short_conv":true,"loops":4,"flat_layers":4,"crawler_layers":1}
RoPE
Uses RoPE dimensions as part of the model configuration.
parameters: {"dimensions":16}
BigramHash
Uses a bigram vocabulary/hash-style component in the architecture.
parameters: {"vocab_size":2048}
Quantization
int6
bits: 6
scope: model weights
GPTQ
bits: null
scope: 41 layers
Compression
zstd
level: null
Weight Averaging
EMA
parameters: {"start_step":4400,"decay":0.99}
Evaluation
sliding window eval
parameters: null
LR Schedule
warmdown
parameters: {"iters":2000}
Sequence Length
sequence_length
train_length: null
eval_length: null
Other
other
Loop-aware GPTQ with quantized-flat activations and crawler Hessians.
parameters: {"enabled":true}
Novel Contributions
- DeltaNet crawler topology with canonical chunk_delta_rule heads
- Loop-aware GPTQ using flat Hessians first, then crawler Hessians with quantized-flat activations
- Late-start EMA re-initialized at warmdown onset
- High-variance multi-seed submission with best seed 0.8104 BPB
- Int6 + zstd artifact compression