PR #435

open

Radial bitnet submission

by rthgitView on GitHub

val_bpb

1.6130

Architecture

Compressed dual-branch Transformer

Optimizer

FROStable + AdamW

Artifact Size

15,943,179 bytes

Training Techniques

Architecture

BigramHash

Adds a 1024-bucket hashed bigram embedding branch for short-horizon lexical context.

parameters: {"buckets":1024}

Radial Token Branch

Adds a token-ID-derived radial geometric feature branch projected into the fusion space.

parameters: null

BitNet-style ternary projections

Uses ternary-weight forward behavior in major internal projections to reduce storage pressure.

parameters: null

Optimizer

AdamW

weight_decay: null

momentum: null

other_params: null

Weight Averaging

EMA

parameters: null

Quantization

mixed int8/int6

bits: null

scope: selected weights

Compression

zlib

level: null

Sequence Length

sequence_length

train_length: 1024

eval_length: null

LR Schedule

cosine decay

parameters: {"warmup":true}

Regularization

weight decay

parameters: null

Other

other

FRO (Fractal Resonant Optimization) used as the main optimizer on the compressed transformer core.

parameters: null

other

Light export-time pruning of values below 0.0025 before final artifact serialization.

parameters: {"threshold":0.0025}

Novel Contributions

FRO (Fractal Resonant Optimization) as the main optimizer
Radial Token Branch for token-level geometric features
1024-bucket bigram hash branch for short-horizon lexical context
BitNet-style ternary-weight behavior in major internal projections
Mixed post-training export with int8/int6 serialization
Light export-time pruning