PR #2077

open

ANS Compression and Hyperloop

by tranphat180603View on GitHub

val_bpb

1.0877

Architecture

Transformer

Optimizer

—

Artifact Size

15,875,599 bytes

Training Techniques

Quantization

GPTQ

bits: null

scope: model weights

Compression

custom

level: null

Evaluation

sliding window eval

parameters: null

Test-Time Training

full TTT

parameters: {"learning_rate":0.005,"epochs":3}

Sequence Length

sequence_length

train_length: 8192

eval_length: null

Architecture

depth recurrence

3-layer recurrence / recurrent middle block with parallel residuals

parameters: {"layers":3}

weight tying

tied embeddings / tied embedding baseline implied by canonical method list

parameters: null

Replaced the quantized artifact compression path with custom ANS-coded compression
Reported improved quantized artifact size versus Brotli/LZMA after quantization
Explored a Hyperloop-lite loop mechanism, but found it slower and not beneficial under the time budget