PR #1952
openNon-record: Energy as the missing leaderboard axis - Wh per bpb drop across 5 configs
by jayyvkView on GitHub
val_bpb
1.1055
Architecture
Transformer
Optimizer
—
Artifact Size
~1.5 MB
Training Techniques
Evaluation
sliding window eval
parameters: {"stride":64}
Test-Time Training
LoRA TTT
parameters: {"rank":8}
Weight Averaging
EMA
parameters: null
Quantization
GPTQ-lite
bits: null
scope: all
Compression
zlib
level: null
Architecture
depth recurrence
Mini depth recurrence used in the frontier config.
parameters: null
ParallelResiduals
Parallel residual architecture variant used in the frontier config.
parameters: null
LR Schedule
warmdown
parameters: {"warmdown_steps":3500}
Novel Contributions
- Adds energy as a missing leaderboard axis for parameter-golf submissions.
- Reproduces five published submissions with NVML hardware-counter energy measurement on 8×H100 SXM5.
- Shows that two configs can achieve similar val_bpb with nearly identical energy via different mechanisms.
- Quantifies that the frontier config is far less energy-efficient than the engineering tier per unit of BPB improvement.
- Measures post-training overhead and shows it can dominate total run energy.