Analysis & Visualization¶

Analyzing Results¶

The ipw analyze command runs post-profiling analysis on results directories.

ipw analyze <results_dir> [--analysis <type>]

IPW ships with two analysis providers:

accuracy (default) -- computes accuracy, IPJ, IPW, and energy/power statistics.
regression -- fits models for energy, power, and latency vs. token counts.

Run multiple analyses on the same results:

ipw analyze ./runs/profile_* --analysis accuracy --analysis regression

Each analysis writes to a separate file in the analysis/ subdirectory without overwriting the others.

Key Metrics¶

Metric	Formula	Description
Accuracy	correct / total_scored	Fraction of correctly answered queries
IPJ	accuracy / avg_energy_per_query	Intelligence Per Joule -- higher is better
IPW	accuracy / avg_power_per_query	Intelligence Per Watt -- higher is better

Energy imputation: When per-query energy readings are zero or negative (common on platforms without hardware energy counters), energy is imputed from power and latency:

imputed_energy = avg_power_watts × query_latency_seconds

Regression Analysis¶

The regression provider fits statistical models to understand how energy, power, and latency scale with input/output length:

ipw analyze ./runs/profile_nvidia_llama3.2_1b_ipw/ --analysis regression

This produces analysis/regression.json with coefficients for:

Energy (joules) vs. input token count
Energy (joules) vs. output token count
Power (watts) vs. input token count
Latency (seconds) vs. output token count

Visualization¶

The ipw plot command generates charts from profiling results.

ipw plot <results_dir> [--output <dir>]

Built-in plots:

Regression scatter plots -- Energy, power, and latency vs. input/output tokens with fitted regression lines.
Output KDE -- Kernel density estimate of the output token length distribution.

Output files are PNG images saved to <results_dir>/plots/ by default, or to a custom directory via --output.

Custom Visualizations¶

Subclass VisualizationProvider and register with the decorator:

from ipw.core.registry import VisualizationRegistry

@VisualizationRegistry.register("my-plot")
class MyPlot(VisualizationProvider):
    ...

Output Formats¶

Arrow Dataset¶

The primary output format. Location: data-00000-of-00001.arrow inside the results directory.

Each row represents one query. Key fields:

Field	Description
`problem`	The input query text
`answer`	The ground-truth answer
`subject`	Dataset category or subject
`model_answers`	Model-generated responses
`model_metrics`	Nested struct with `energy_metrics`, `latency_metrics`, `power_metrics`, `token_metrics`, `cost`

Loading an Arrow dataset:

from datasets import load_from_disk

ds = load_from_disk("runs/profile_nvidia_llama3.2_1b_ipw/")
energy = ds[0]["model_metrics"]["model"]["energy_metrics"]["per_query_joules"]

JSONL Traces (Agentic Runs)¶

Agentic runs produce traces.jsonl -- one QueryTrace per line.

QueryTrace: query_id, turns[], total_wall_clock_s, completed
TurnTrace: turn_index, tokens, tools_called, energy, power, cost

Loading traces:

from pathlib import Path
from ipw.execution.traces import QueryTrace

traces = QueryTrace.load_jsonl(Path("runs/run_react_gpt4o_gaia/traces.jsonl"))

Summary & Reports¶

Each results directory contains:

File	Contents
`summary.json`	Dataset, model, client, hardware info, timing
`analysis/accuracy.json`	Accuracy, IPJ, IPW, energy/power statistics
`analysis/regression.json`	Regression coefficients (after regression analysis)