PR #1153

open

Non-record: Triton KV-cache backend for autoregressive eval

by LucasErcolanoView on GitHub

val_bpb

1.6507

Architecture

Transformer

Optimizer

—

Artifact Size

10,458,900 bytes

Training Techniques

Evaluation

autoregressive eval

parameters: {"context_length":1024,"max_tokens":2048}

Sequence Length

sequence_length

train_length: null

eval_length: 2048

Other

other

Triton-backed KV-cache evaluation backends with fused grouped-int8 score/apply kernels and fused QJL sign-score plus grouped value apply kernels

parameters: {"backends":["int8_triton","qjl_triton"]}

other

Prewarm logic to avoid first-use Triton JIT compile overhead during short evaluations

parameters: null

other

Benchmark-side peak CUDA memory reporting to sanity-check allocator behavior

parameters: null