Intelligence Per Watt¶
Benchmarking Intelligence Efficiency of LM Inference.
-
Profile
Single-turn and agentic inference profiling with per-query telemetry across any OpenAI-compatible endpoint.
-
Measure
Real-time energy, power, temperature, and memory telemetry via a Rust gRPC service sampling at 50ms.
-
Analyze
Intelligence Per Joule and Intelligence Per Watt metrics with accuracy scoring, regression analysis, and plots.
-
Extend
Plug in custom inference clients, benchmark datasets, agent harnesses, and platform collectors.
Key Metrics¶
- Intelligence Per Joule (IPJ) = accuracy / average energy per query (joules)
- Intelligence Per Watt (IPW) = accuracy / average power per query (watts)
What's Included¶
| Component | Options |
|---|---|
| Clients | Ollama, vLLM, OpenAI-compatible (OpenAI, OpenRouter, Gemini, local servers) |
| Agents | ReAct (Agno), OpenHands, Terminus |
| Datasets | MMLU-Pro, GPQA, SuperGPQA, MATH-500, GAIA, SimpleQA, FRAMES, HLE, TerminalBench, SWE-bench, SWEfficiency |
| Telemetry | NVIDIA (NVML), AMD (ROCm), Apple Silicon (powermetrics), Linux (RAPL) |
| Evaluation | LLM judge, MCQ exact match, task-specific scoring |
About¶
Built by Stanford Hazy Research and the Scaling Intelligence Lab.
Paper: arXiv:2511.07885