Intelligence Per Watt¶

Benchmarking Intelligence Efficiency of LM Inference.

Profile

Single-turn and agentic inference profiling with per-query telemetry across any OpenAI-compatible endpoint.

Profiling guide
Measure

Real-time energy, power, temperature, and memory telemetry via a Rust gRPC service sampling at 50ms.

Benchmarking overview
Analyze

Intelligence Per Joule and Intelligence Per Watt metrics with accuracy scoring, regression analysis, and plots.

Analysis guide
Extend

Plug in custom inference clients, benchmark datasets, agent harnesses, and platform collectors.

Extending IPW

Key Metrics¶

Component	Options
Clients	Ollama, vLLM, OpenAI-compatible (OpenAI, OpenRouter, Gemini, local servers)
Agents	ReAct (Agno), OpenHands, Terminus
Datasets	MMLU-Pro, GPQA, SuperGPQA, MATH-500, GAIA, SimpleQA, FRAMES, HLE, TerminalBench, SWE-bench, SWEfficiency
Telemetry	NVIDIA (NVML), AMD (ROCm), Apple Silicon (powermetrics), Linux (RAPL)
Evaluation	LLM judge, MCQ exact match, task-specific scoring