Metrics Reference¶

Complete catalog of all metrics collected by IPW, organized by category.

Energy Metrics¶

Metric	Unit	Type	Description
`power_watts`	W	float	Instantaneous GPU power draw
`energy_joules`	J	float	Accumulated GPU energy since baseline
`cpu_power_watts`	W	float	Instantaneous CPU power draw
`cpu_energy_joules`	J	float	Accumulated CPU energy since baseline
`ane_power_watts`	W	float	Apple Neural Engine power (macOS only)
`ane_energy_joules`	J	float	Accumulated ANE energy (macOS only)

Metric	Unit	Description
`energy_metrics.per_query_joules`	J	GPU energy consumed for this query
`energy_metrics.total_joules`	J	Cumulative GPU energy
`energy_metrics.cpu_per_query_joules`	J	CPU energy consumed for this query
`energy_metrics.cpu_total_joules`	J	Cumulative CPU energy
`energy_metrics.ane_per_query_joules`	J	ANE energy for this query (macOS)
`energy_metrics.ane_total_joules`	J	Cumulative ANE energy

Metric	NVIDIA	AMD	Apple	RAPL	Null
GPU power/energy	yes	yes	yes	--	--
CPU power/energy	RAPL	RAPL	yes	yes	--
ANE power/energy	--	--	yes	--	--

Metric	Unit	Type	Description
`temperature_celsius`	C	float	GPU die temperature

Metric	Unit	Description
`temperature_metrics.avg`	C	Average GPU temp during query
`temperature_metrics.max`	C	Peak GPU temp during query
`temperature_metrics.median`	C	Median GPU temp during query
`temperature_metrics.min`	C	Minimum GPU temp during query

Available on: NVIDIA, AMD. Returns -1 on Apple Silicon and null collector.

Metric	Unit	Type	Description
`gpu_memory_usage_mb`	MB	float	Current GPU memory used
`gpu_memory_total_mb`	MB	float	Total GPU memory
`cpu_memory_usage_mb`	MB	float	Current system memory used

Metric	Description
`memory_metrics.gpu_mb.{avg,max,median,min}`	GPU memory stats during query
`memory_metrics.cpu_mb.{avg,max,median,min}`	CPU memory stats during query

Metric	Unit	Description
`latency_metrics.per_token_ms`	ms	Average time per output token
`latency_metrics.throughput_tokens_per_sec`	tok/s	Output token throughput
`latency_metrics.time_to_first_token_seconds`	s	Time from request to first token
`latency_metrics.total_query_seconds`	s	Total wall-clock time for the query

All latency metrics are computed from Python-side timestamps, not from the energy monitor.

Metric	Unit	Description
`token_metrics.input`	tokens	Number of input/prompt tokens
`token_metrics.output`	tokens	Number of output/completion tokens
`token_metrics.total`	tokens	Total tokens (input + output)

Token counts come from the inference client's response (ChatUsage).

Metric	Description
`power_metrics.gpu.per_query_watts.{avg,max,median,min}`	GPU power stats during query
`power_metrics.gpu.total_watts.{avg,max,median,min}`	Cumulative GPU power stats
`power_metrics.cpu.per_query_watts.{avg,max,median,min}`	CPU power stats during query
`power_metrics.cpu.total_watts.{avg,max,median,min}`	Cumulative CPU power stats

Metric	Unit	Description
`gpu_compute_utilization_pct`	%	SM/compute core utilization
`gpu_memory_bandwidth_utilization_pct`	%	Memory controller utilization
`gpu_tensor_core_utilization_pct`	%	Tensor core utilization (Ampere+)

Metric	Description
`hardware_utilization.gpu.compute_utilization_pct`	Avg compute util during query
`hardware_utilization.gpu.memory_bandwidth_utilization_pct`	Avg memory BW util
`hardware_utilization.gpu.tensor_core_utilization_pct`	Avg tensor core util
`hardware_utilization.gpu.memory_used_gb`	GPU memory used (GB)
`hardware_utilization.gpu.memory_total_gb`	GPU memory total (GB)

Metric	Description
`hardware_utilization.derived.mfu`	Model FLOPs Utilization
`hardware_utilization.derived.mbu`	Model Bandwidth Utilization
`hardware_utilization.derived.arithmetic_intensity`	FLOPs per byte ratio

Phase metrics break down energy and latency into prefill (processing input) and decode (generating output) phases:

Metric	Unit	Description
`phase_metrics.prefill_energy_j`	J	Energy during prefill
`phase_metrics.decode_energy_j`	J	Energy during decode
`phase_metrics.prefill_duration_ms`	ms	Prefill wall-clock time
`phase_metrics.decode_duration_ms`	ms	Decode wall-clock time
`phase_metrics.prefill_power_avg_w`	W	Average power during prefill
`phase_metrics.decode_power_avg_w`	W	Average power during decode
`phase_metrics.prefill_energy_per_input_token_j`	J/tok	Prefill energy efficiency
`phase_metrics.decode_energy_per_output_token_j`	J/tok	Decode energy efficiency

Phase separation requires detecting the time-to-first-token boundary.

Metric	Unit	Description
`compute_metrics.total_flops`	FLOPs	Estimated total FLOPs for the query
`compute_metrics.flops_per_token`	FLOPs/tok	Estimated FLOPs per token
`compute_metrics.flops_per_request`	FLOPs	Same as total_flops

FLOPs are estimated using the 2PT formula (where P = model parameters, T = tokens) or the optional calflops library.

Cost is computed from the pricing tables in ipw/cost/pricing.py based on provider, model, and token counts.

These are computed during accuracy analysis, not captured per-query:

Metric	Formula	Description
Intelligence Per Joule (IPJ)	accuracy / avg_energy_per_query	Accuracy per joule of energy
Intelligence Per Watt (IPW)	accuracy / avg_power_per_query	Accuracy per watt of power

The energy monitor uses these sentinel values for unavailable metrics:

Always validate with math.isfinite(value) and value > 0 before using energy/power values in efficiency calculations.