PR #1952

open

Non-record: Energy as the missing leaderboard axis - Wh per bpb drop across 5 configs

val_bpb
1.1055
Architecture
Transformer
Optimizer
Artifact Size
~1.5 MB

Training Techniques

Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
LoRA TTT
parameters: {"rank":8}
Weight Averaging
EMA
parameters: null
Quantization
GPTQ-lite
bits: null
scope: all
Compression
zlib
level: null
Architecture
depth recurrence
Mini depth recurrence used in the frontier config.
parameters: null
ParallelResiduals
Parallel residual architecture variant used in the frontier config.
parameters: null
LR Schedule
warmdown
parameters: {"warmdown_steps":3500}

Novel Contributions

  • Adds energy as a missing leaderboard axis for parameter-golf submissions.
  • Reproduces five published submissions with NVML hardware-counter energy measurement on 8×H100 SXM5.
  • Shows that two configs can achieve similar val_bpb with nearly identical energy via different mechanisms.
  • Quantifies that the frontier config is far less energy-efficient than the engineering tier per unit of BPB improvement.
  • Measures post-training overhead and shows it can dominate total run energy.