PR #1757

open

[Non-record] H-Net MAMBA Outer-Layer Ablation: OL2 collapses, OL1 converges to 1.5194 INT6 BPB

val_bpb
1.5194
Architecture
Hybrid
Optimizer
Artifact Size
13.1 MB

Training Techniques

Architecture
Mamba
H-Net with MAMBA outer layers; ablation compares 2 outer Mamba layers vs 1 outer Mamba layer within a fixed 9-layer budget.
parameters: {"outer_layers":1,"layers":9,"encoder_layers":1,"main_layers":7,"decoder_layers":1,"kv_heads":4,"heads":8,"dim":512}
GQA
Grouped-query attention with 4 KV heads.
parameters: {"kv_heads":4}
Evaluation
sliding window eval
parameters: {"stride":64}
Quantization
GPTQ
bits: 6
scope: all
QAT
bits: 6
scope: all
Compression
zlib
level: 9
Sequence Length
sequence_length
train_length: 1024
eval_length: null
LR Schedule
warmdown
parameters: {"warmdown_iters":1200}

Novel Contributions

  • Ablation of MAMBA outer-layer count within a fixed 9-layer H-Net budget
  • Demonstration that increasing outer Mamba layers from 1 to 2 caused training collapse and slower steps
  • Stable OL1 configuration achieving 1.4770 float BPB and 1.5194 INT6 sliding-window BPB
  • Use of INT6 GPTQ with zlib-9 compression and sliding-window evaluation