val_bpb
1.6924
Architecture
Transformer
Optimizer
—
Artifact Size
2,290,235 bytes
Training Techniques
Architecture
Hybrid
Faithful standalone mHC-lite branch with explicit residual-stream expansion and reduction, plus hyper-connected attention and MLP branches in a compact GPT-style model.
parameters: {"layers":4,"model_dim":256,"num_heads":4,"num_kv_heads":4,"num_streams":4,"num_fracs":1}
Sequence Length
sequence_length
train_length: 256
eval_length: null
Regularization
dropout
parameters: {"rate":0}
Novel Contributions
- Faithful standalone mHC-lite architecture
- Residual-stream expansion and reduction
- Hyper-connected attention and MLP branches
- Compact GPT-style integration of MHCLite
- Bundled local copy of hyper_conn/mhc_lite.py for standalone execution
- Non-record methodological / negative-result submission