Analysis & Visualization¶
Analyzing Results¶
The ipw analyze command runs post-profiling analysis on results directories.
IPW ships with two analysis providers:
- accuracy (default) -- computes accuracy, IPJ, IPW, and energy/power statistics.
- regression -- fits models for energy, power, and latency vs. token counts.
Run multiple analyses on the same results:
Each analysis writes to a separate file in the analysis/ subdirectory without overwriting the others.
Key Metrics¶
| Metric | Formula | Description |
|---|---|---|
| Accuracy | correct / total_scored | Fraction of correctly answered queries |
| IPJ | accuracy / avg_energy_per_query | Intelligence Per Joule -- higher is better |
| IPW | accuracy / avg_power_per_query | Intelligence Per Watt -- higher is better |
Energy imputation: When per-query energy readings are zero or negative (common on platforms without hardware energy counters), energy is imputed from power and latency:
Regression Analysis¶
The regression provider fits statistical models to understand how energy, power, and latency scale with input/output length:
This produces analysis/regression.json with coefficients for:
- Energy (joules) vs. input token count
- Energy (joules) vs. output token count
- Power (watts) vs. input token count
- Latency (seconds) vs. output token count
Visualization¶
The ipw plot command generates charts from profiling results.
Built-in plots:
- Regression scatter plots -- Energy, power, and latency vs. input/output tokens with fitted regression lines.
- Output KDE -- Kernel density estimate of the output token length distribution.
Output files are PNG images saved to <results_dir>/plots/ by default, or to a custom directory via --output.
Custom Visualizations¶
Subclass VisualizationProvider and register with the decorator:
from ipw.core.registry import VisualizationRegistry
@VisualizationRegistry.register("my-plot")
class MyPlot(VisualizationProvider):
...
Output Formats¶
Arrow Dataset¶
The primary output format. Location: data-00000-of-00001.arrow inside the results directory.
Each row represents one query. Key fields:
| Field | Description |
|---|---|
problem |
The input query text |
answer |
The ground-truth answer |
subject |
Dataset category or subject |
model_answers |
Model-generated responses |
model_metrics |
Nested struct with energy_metrics, latency_metrics, power_metrics, token_metrics, cost |
Loading an Arrow dataset:
from datasets import load_from_disk
ds = load_from_disk("runs/profile_nvidia_llama3.2_1b_ipw/")
energy = ds[0]["model_metrics"]["model"]["energy_metrics"]["per_query_joules"]
JSONL Traces (Agentic Runs)¶
Agentic runs produce traces.jsonl -- one QueryTrace per line.
- QueryTrace:
query_id,turns[],total_wall_clock_s,completed - TurnTrace:
turn_index,tokens,tools_called,energy,power,cost
Loading traces:
from pathlib import Path
from ipw.execution.traces import QueryTrace
traces = QueryTrace.load_jsonl(Path("runs/run_react_gpt4o_gaia/traces.jsonl"))
Summary & Reports¶
Each results directory contains:
| File | Contents |
|---|---|
summary.json |
Dataset, model, client, hardware info, timing |
analysis/accuracy.json |
Accuracy, IPJ, IPW, energy/power statistics |
analysis/regression.json |
Regression coefficients (after regression analysis) |