PR #1757
open[Non-record] H-Net MAMBA Outer-Layer Ablation: OL2 collapses, OL1 converges to 1.5194 INT6 BPB
by aiejvnView on GitHub
val_bpb
1.5194
Architecture
Hybrid
Optimizer
—
Artifact Size
13.1 MB
Training Techniques
Architecture
Mamba
H-Net with MAMBA outer layers; ablation compares 2 outer Mamba layers vs 1 outer Mamba layer within a fixed 9-layer budget.
parameters: {"outer_layers":1,"layers":9,"encoder_layers":1,"main_layers":7,"decoder_layers":1,"kv_heads":4,"heads":8,"dim":512}
GQA
Grouped-query attention with 4 KV heads.
parameters: {"kv_heads":4}
Evaluation
sliding window eval
parameters: {"stride":64}
Quantization
GPTQ
bits: 6
scope: all
QAT
bits: 6
scope: all
Compression
zlib
level: 9
Sequence Length
sequence_length
train_length: 1024
eval_length: null
LR Schedule
warmdown
parameters: {"warmdown_iters":1200}
Novel Contributions
- Ablation of MAMBA outer-layer count within a fixed 9-layer H-Net budget
- Demonstration that increasing outer Mamba layers from 1 to 2 caused training collapse and slower steps
- Stable OL1 configuration achieving 1.4770 float BPB and 1.5194 INT6 sliding-window BPB
- Use of INT6 GPTQ with zlib-9 compression and sliding-window evaluation