Intelligence Per Watt¶
A benchmarking suite for LLM inference systems that measures accuracy alongside energy consumption, power, memory, temperature, and latency.
Intelligence Per Watt (IPW) introduces two key metrics for evaluating the efficiency of LLM inference:
- Intelligence Per Joule (IPJ) = accuracy / average energy per query (joules)
- Intelligence Per Watt (IPW) = accuracy / average power per query (watts)
These metrics let you compare models and hardware configurations on a single axis that captures both correctness and energy cost.
Key Features¶
Single-Turn Profiling¶
Profile any OpenAI-compatible inference server. IPW sends workloads from curated datasets, captures per-query energy telemetry via a Rust gRPC service, and computes efficiency metrics automatically.
Agentic Profiling¶
Profile multi-turn agent workloads with tool use. Three agent harnesses are built in -- ReAct (via Agno), OpenHands, and Terminus -- each with per-step trace collection and energy attribution.
10+ Benchmark Datasets¶
From single-turn knowledge tests (MMLU-Pro, SuperGPQA, GPQA) to agentic benchmarks (GAIA, FRAMES, HLE, SimpleQA, TerminalBench) and coding tasks (SWE-bench, SWEfficiency).
Cross-Platform Energy Telemetry¶
A standalone Rust gRPC service streams telemetry at 50ms intervals. Auto-detects the platform collector:
| Platform | Collector | GPU Metrics | CPU Metrics |
|---|---|---|---|
| NVIDIA | NVML | Power, energy, temp, utilization, memory | Via RAPL |
| AMD | ROCm SMI | Power, energy, temp, utilization, memory | Via RAPL |
| Apple Silicon | powermetrics | GPU power/energy | CPU + ANE power/energy |
| Linux (CPU-only) | RAPL | -- | Package + core energy |
Cost Tracking¶
Built-in pricing tables for OpenAI, Anthropic, and Google Gemini APIs. Per-query and per-turn cost is tracked automatically for cloud-hosted models.
FLOPs Estimation¶
Estimate computational cost using known model parameter counts (2PT formula) or the optional calflops library for HuggingFace models.
Quick Links¶
- Installation -- get IPW running on your system
- Quickstart -- run your first benchmark in minutes
- Profiling Guide -- deep dive on
ipw profile - Agentic Profiling -- profile multi-turn agents with
ipw run - Datasets Overview -- browse available benchmark datasets
- Platform Support -- check what metrics are available on your hardware
Architecture Overview¶
Dataset --> InferenceClient --> Response + TelemetryWindow --> ProfilingRecord --> Arrow dataset
|
EvaluationHandler (LLM judge) --> scored dataset
|
AnalysisProvider --> IPJ/IPW metrics + JSON report
All components (clients, datasets, agents, evaluators, analyses, visualizations) are registered via a decorator-based registry pattern. The CLI resolves them by string key, so extending IPW requires only implementing an ABC and adding a @Registry.register("id") decorator.
Output Structure¶
Each profiling run produces a self-contained results directory: