Metrics Reference
Complete catalog of all metrics collected by IPW, organized by category.
Energy Metrics
| Metric |
Unit |
Type |
Description |
power_watts |
W |
float |
Instantaneous GPU power draw |
energy_joules |
J |
float |
Accumulated GPU energy since baseline |
cpu_power_watts |
W |
float |
Instantaneous CPU power draw |
cpu_energy_joules |
J |
float |
Accumulated CPU energy since baseline |
ane_power_watts |
W |
float |
Apple Neural Engine power (macOS only) |
ane_energy_joules |
J |
float |
Accumulated ANE energy (macOS only) |
Per-Query Energy (in ProfilingRecord)
| Metric |
Unit |
Description |
energy_metrics.per_query_joules |
J |
GPU energy consumed for this query |
energy_metrics.total_joules |
J |
Cumulative GPU energy |
energy_metrics.cpu_per_query_joules |
J |
CPU energy consumed for this query |
energy_metrics.cpu_total_joules |
J |
Cumulative CPU energy |
energy_metrics.ane_per_query_joules |
J |
ANE energy for this query (macOS) |
energy_metrics.ane_total_joules |
J |
Cumulative ANE energy |
| Metric |
NVIDIA |
AMD |
Apple |
RAPL |
Null |
| GPU power/energy |
yes |
yes |
yes |
-- |
-- |
| CPU power/energy |
RAPL |
RAPL |
yes |
yes |
-- |
| ANE power/energy |
-- |
-- |
yes |
-- |
-- |
Temperature Metrics
| Metric |
Unit |
Type |
Description |
temperature_celsius |
C |
float |
GPU die temperature |
Per-Query Temperature (in ProfilingRecord)
| Metric |
Unit |
Description |
temperature_metrics.avg |
C |
Average GPU temp during query |
temperature_metrics.max |
C |
Peak GPU temp during query |
temperature_metrics.median |
C |
Median GPU temp during query |
temperature_metrics.min |
C |
Minimum GPU temp during query |
Available on: NVIDIA, AMD. Returns -1 on Apple Silicon and null collector.
Memory Metrics
| Metric |
Unit |
Type |
Description |
gpu_memory_usage_mb |
MB |
float |
Current GPU memory used |
gpu_memory_total_mb |
MB |
float |
Total GPU memory |
cpu_memory_usage_mb |
MB |
float |
Current system memory used |
Per-Query Memory (in ProfilingRecord)
| Metric |
Description |
memory_metrics.gpu_mb.{avg,max,median,min} |
GPU memory stats during query |
memory_metrics.cpu_mb.{avg,max,median,min} |
CPU memory stats during query |
Latency Metrics
| Metric |
Unit |
Description |
latency_metrics.per_token_ms |
ms |
Average time per output token |
latency_metrics.throughput_tokens_per_sec |
tok/s |
Output token throughput |
latency_metrics.time_to_first_token_seconds |
s |
Time from request to first token |
latency_metrics.total_query_seconds |
s |
Total wall-clock time for the query |
All latency metrics are computed from Python-side timestamps, not from the energy monitor.
Token Metrics
| Metric |
Unit |
Description |
token_metrics.input |
tokens |
Number of input/prompt tokens |
token_metrics.output |
tokens |
Number of output/completion tokens |
token_metrics.total |
tokens |
Total tokens (input + output) |
Token counts come from the inference client's response (ChatUsage).
Power Metrics
| Metric |
Description |
power_metrics.gpu.per_query_watts.{avg,max,median,min} |
GPU power stats during query |
power_metrics.gpu.total_watts.{avg,max,median,min} |
Cumulative GPU power stats |
power_metrics.cpu.per_query_watts.{avg,max,median,min} |
CPU power stats during query |
power_metrics.cpu.total_watts.{avg,max,median,min} |
Cumulative CPU power stats |
GPU Utilization Metrics
| Metric |
Unit |
Description |
gpu_compute_utilization_pct |
% |
SM/compute core utilization |
gpu_memory_bandwidth_utilization_pct |
% |
Memory controller utilization |
gpu_tensor_core_utilization_pct |
% |
Tensor core utilization (Ampere+) |
Per-Query Utilization (in ProfilingRecord)
| Metric |
Description |
hardware_utilization.gpu.compute_utilization_pct |
Avg compute util during query |
hardware_utilization.gpu.memory_bandwidth_utilization_pct |
Avg memory BW util |
hardware_utilization.gpu.tensor_core_utilization_pct |
Avg tensor core util |
hardware_utilization.gpu.memory_used_gb |
GPU memory used (GB) |
hardware_utilization.gpu.memory_total_gb |
GPU memory total (GB) |
Derived Utilization
| Metric |
Description |
hardware_utilization.derived.mfu |
Model FLOPs Utilization |
hardware_utilization.derived.mbu |
Model Bandwidth Utilization |
hardware_utilization.derived.arithmetic_intensity |
FLOPs per byte ratio |
Phase Metrics
Phase metrics break down energy and latency into prefill (processing input) and decode (generating output) phases:
| Metric |
Unit |
Description |
phase_metrics.prefill_energy_j |
J |
Energy during prefill |
phase_metrics.decode_energy_j |
J |
Energy during decode |
phase_metrics.prefill_duration_ms |
ms |
Prefill wall-clock time |
phase_metrics.decode_duration_ms |
ms |
Decode wall-clock time |
phase_metrics.prefill_power_avg_w |
W |
Average power during prefill |
phase_metrics.decode_power_avg_w |
W |
Average power during decode |
phase_metrics.prefill_energy_per_input_token_j |
J/tok |
Prefill energy efficiency |
phase_metrics.decode_energy_per_output_token_j |
J/tok |
Decode energy efficiency |
Phase separation requires detecting the time-to-first-token boundary.
Compute Metrics
| Metric |
Unit |
Description |
compute_metrics.total_flops |
FLOPs |
Estimated total FLOPs for the query |
compute_metrics.flops_per_token |
FLOPs/tok |
Estimated FLOPs per token |
compute_metrics.flops_per_request |
FLOPs |
Same as total_flops |
FLOPs are estimated using the 2PT formula (where P = model parameters, T = tokens) or the optional calflops library.
Cost Metrics
| Metric |
Unit |
Description |
cost.input_cost_usd |
USD |
Cost of input tokens |
cost.output_cost_usd |
USD |
Cost of output tokens |
cost.tool_cost_usd |
USD |
Cost of tool calls (e.g., Tavily) |
cost.total_cost_usd |
USD |
Total API cost |
Cost is computed from the pricing tables in ipw/cost/pricing.py based on provider, model, and token counts.
Efficiency Metrics (Computed)
These are computed during accuracy analysis, not captured per-query:
| Metric |
Formula |
Description |
| Intelligence Per Joule (IPJ) |
accuracy / avg_energy_per_query |
Accuracy per joule of energy |
| Intelligence Per Watt (IPW) |
accuracy / avg_power_per_query |
Accuracy per watt of power |
Sentinel Values
The energy monitor uses these sentinel values for unavailable metrics:
- -1 (Rust/proto): The metric is not available on this platform
- None (Python): The metric was not captured or is not applicable
- 0 : The metric is available but the reading was zero
Always validate with math.isfinite(value) and value > 0 before using energy/power values in efficiency calculations.