PR #1140

open

Crawler — 8.8MB -1.1874 BPB (3-seed mean, 8xH100, 600s)

val_bpb

1.1874

Architecture

Transformer

Optimizer

—

Artifact Size

9.36MB

Training Techniques

Architecture

XSA

4 flat XSA layers with a shared crawler block repeated in loops

parameters: {"layers":4,"loops":3}

KV head count

Uses 8 attention heads and 4 KV heads

parameters: {"heads":8,"kv_heads":4}

RoPE

RoPE configured with a 16 setting

parameters: {"value":16}

BigramHash

Bigram setting used in the architecture

parameters: {"bigram":2048}

Quantization

QAT

bits: 8

scope: model

int6

bits: 6

scope: final artifact

Compression

zstd

level: null