Energy Monitor¶
The energy monitor is a standalone Rust gRPC service that streams hardware telemetry at 50ms intervals. It runs as a subprocess alongside the Python profiler and provides real-time power, energy, temperature, memory, and utilization readings.
Architecture¶
Python (ProfilerRunner) Rust (energy-monitor)
| |
|--- Launch subprocess ---> |
| Start gRPC server
| Auto-detect collector
| |
|<-- StreamTelemetry (gRPC stream) -------->|
| TelemetryReading every 50ms |
| |
|--- Health (gRPC unary) ----------------->|
|<-- HealthResponse -----------------------|
Components¶
Rust binary (energy-monitor/): The compiled binary is located at ipw/telemetry/bin/<platform>/energy-monitor. Platform variants:
linux-x86_64/energy-monitormacos-arm64/energy-monitor
Python launcher (ipw/telemetry/launcher.py): Manages the subprocess lifecycle -- starting, monitoring, and stopping the energy monitor process.
Python collector (ipw/telemetry/collector.py): gRPC client that connects to the energy monitor's streaming endpoint and converts protobuf messages into Python TelemetryReading objects.
Proto Definition¶
The service is defined in energy-monitor/proto/energy.proto:
service EnergyMonitor {
rpc Health(HealthRequest) returns (HealthResponse);
rpc StreamTelemetry(StreamRequest) returns (stream TelemetryReading);
}
message TelemetryReading {
double power_watts = 1;
double energy_joules = 2;
double temperature_celsius = 3;
double gpu_memory_usage_mb = 4;
double cpu_memory_usage_mb = 5;
double cpu_power_watts = 10;
double cpu_energy_joules = 11;
double ane_power_watts = 12;
double ane_energy_joules = 13;
double gpu_compute_utilization_pct = 14;
double gpu_memory_bandwidth_utilization_pct = 15;
double gpu_tensor_core_utilization_pct = 16;
double gpu_memory_total_mb = 17;
string platform = 6;
int64 timestamp_nanos = 7;
SystemInfo system_info = 8;
GpuInfo gpu_info = 9;
}
Unavailable Metrics¶
Fields that the current platform cannot provide are set to -1. Python code should check for this sentinel with math.isfinite() or compare against 0 before using values in calculations.
Collector Auto-Detection¶
The Rust service auto-detects the appropriate collector at startup:
-
macOS (
cfg(target_os = "macos")): Usespowermetricsfor CPU/GPU/ANE power readings. Requiressudo. -
Linux/Windows with NVIDIA GPU: Uses NVML. Falls through to next option if NVML initialization fails.
-
Linux with AMD GPU: Uses ROCm SMI. Falls through if unavailable.
-
Linux with RAPL: Uses Intel RAPL for CPU energy counters. Available on Intel and some AMD CPUs.
-
Null collector: Fallback that reports only CPU memory usage. All power/energy/temperature fields return -1.
The selected platform is reported in each TelemetryReading.platform field (e.g., "nvidia", "macos", "amd", "rapl", "null").
Building¶
This runs cargo build --release in the energy-monitor/ directory and copies the binary to the appropriate platform subdirectory under ipw/telemetry/bin/.
Testing¶
This starts the energy monitor, connects via gRPC, and prints telemetry readings at the specified interval (default: 1 second). It helps verify that the binary works correctly on your platform.
Streaming Interval¶
The service streams telemetry at a fixed 50ms interval. This provides sufficient time resolution to:
- Attribute energy to individual inference queries (typically 0.5-30 seconds)
- Capture power spikes during GPU-intensive prefill phases
- Track memory usage changes during batch processing
The interval is not configurable -- it is hardcoded in the StreamRequest handler to ensure consistent measurement across deployments.
Python Integration¶
TelemetrySession¶
The TelemetrySession (ipw/execution/telemetry_session.py) wraps the gRPC client in a threaded sampling loop:
- Maintains a rolling buffer of
TelemetryReadingobjects - Provides
get_window(start_time, end_time)to extract readings for a specific time range - Thread-safe for concurrent access from the profiling runner
Launcher¶
The Launcher (ipw/telemetry/launcher.py) manages the energy monitor process:
- Finds the correct binary for the current platform
- Starts the process and waits for the gRPC server to become healthy
- Stops the process when profiling is complete
- Handles process crashes and cleanup