PR #1896

open

Record: Flower Brain 6-Cell Ternary Architecture — val_bpb 1.1155 (unlimited compute)

by G3sparkyView on GitHub
val_bpb
1.1155
Architecture
Transformer
Optimizer
Artifact Size
10.4 MB

Training Techniques

Architecture
depth recurrence
6-cell Flower of Life hexagonal topology with specialized cells and recurrent layer reuse.
parameters: {"cells":6,"layers":12,"virtual_layers":17,"dimensions":512}
GQA
Grouped query attention used in the attention cell.
parameters: {"kv_heads":"8/4"}
Quantization
STE QAT
bits: null
scope: weights
ternary
bits: 2
scope: MLP layers
int6
bits: 6
scope: attention weights
Regularization
weight decay
parameters: {"weight_decay":0.095}
Compression
lzma
level: null

Novel Contributions

  • 6-cell Flower of Life hexagonal topology for a language model
  • Ternary BitLinear weights with STE-based quantization-aware training
  • Depth recurrence producing 17 virtual layers from 12 physical layers
  • Mixed ternary packing plus int6 GPTQ compression for a compact artifact
  • Empirical finding that void fraction appears architecture-determined
  • Observation that STE worsened the quantization gap in this setup