Skip to content

Intelligence Per Watt

A benchmarking suite for LLM inference systems that measures accuracy alongside energy consumption, power, memory, temperature, and latency.

Intelligence Per Watt (IPW) introduces two key metrics for evaluating the efficiency of LLM inference:

  • Intelligence Per Joule (IPJ) = accuracy / average energy per query (joules)
  • Intelligence Per Watt (IPW) = accuracy / average power per query (watts)

These metrics let you compare models and hardware configurations on a single axis that captures both correctness and energy cost.

Key Features

Single-Turn Profiling

Profile any OpenAI-compatible inference server. IPW sends workloads from curated datasets, captures per-query energy telemetry via a Rust gRPC service, and computes efficiency metrics automatically.

ipw profile --client ollama --model llama3.2:1b --client-base-url http://localhost:11434

Agentic Profiling

Profile multi-turn agent workloads with tool use. Three agent harnesses are built in -- ReAct (via Agno), OpenHands, and Terminus -- each with per-step trace collection and energy attribution.

ipw run --agent react --model gpt-4o --dataset gaia --max-queries 10

10+ Benchmark Datasets

From single-turn knowledge tests (MMLU-Pro, SuperGPQA, GPQA) to agentic benchmarks (GAIA, FRAMES, HLE, SimpleQA, TerminalBench) and coding tasks (SWE-bench, SWEfficiency).

Cross-Platform Energy Telemetry

A standalone Rust gRPC service streams telemetry at 50ms intervals. Auto-detects the platform collector:

Platform Collector GPU Metrics CPU Metrics
NVIDIA NVML Power, energy, temp, utilization, memory Via RAPL
AMD ROCm SMI Power, energy, temp, utilization, memory Via RAPL
Apple Silicon powermetrics GPU power/energy CPU + ANE power/energy
Linux (CPU-only) RAPL -- Package + core energy

Cost Tracking

Built-in pricing tables for OpenAI, Anthropic, and Google Gemini APIs. Per-query and per-turn cost is tracked automatically for cloud-hosted models.

FLOPs Estimation

Estimate computational cost using known model parameter counts (2PT formula) or the optional calflops library for HuggingFace models.

Architecture Overview

Dataset --> InferenceClient --> Response + TelemetryWindow --> ProfilingRecord --> Arrow dataset
                                                                                     |
                                                              EvaluationHandler (LLM judge) --> scored dataset
                                                                                     |
                                                              AnalysisProvider --> IPJ/IPW metrics + JSON report

All components (clients, datasets, agents, evaluators, analyses, visualizations) are registered via a decorator-based registry pattern. The CLI resolves them by string key, so extending IPW requires only implementing an ABC and adding a @Registry.register("id") decorator.

Output Structure

Each profiling run produces a self-contained results directory:

runs/profile_<hardware>_<model>_<dataset>/
    data-*.arrow        # Per-query metrics (HuggingFace dataset format)
    summary.json        # Run metadata and configuration
    analysis/           # JSON reports (accuracy, regression)
    plots/              # Visualizations (scatter, KDE)