Index

`ipw.agents.mcp` ¶

MCP (Model Context Protocol) server implementations.

This module provides unified interfaces for: - Local models (via Ollama, vLLM) - Cloud APIs (OpenAI, Anthropic, Gemini, OpenRouter) - Tools (calculator, web search, code interpreter) - Retrieval (BM25, dense, grep, hybrid)

All servers automatically capture telemetry (energy, power, cost, latency).

`BaseMCPServer` ¶

Bases: ABC

Base class for all MCP servers with automatic telemetry capture.

All subclasses must implement _execute_impl() which performs the actual tool invocation. The base class wraps this with telemetry collection.

Example

class MyTool(BaseMCPServer): def _execute_impl(self, prompt: str, **params) -> MCPToolResult: response = self.api.call(prompt) return MCPToolResult( content=response.text, usage={"prompt_tokens": 100, "completion_tokens": 50}, cost_usd=0.001 )

Source code in intelligence-per-watt/src/ipw/agents/mcp/base.py

class BaseMCPServer(ABC):
    """Base class for all MCP servers with automatic telemetry capture.

    All subclasses must implement _execute_impl() which performs the actual
    tool invocation. The base class wraps this with telemetry collection.

    Example:
        class MyTool(BaseMCPServer):
            def _execute_impl(self, prompt: str, **params) -> MCPToolResult:
                response = self.api.call(prompt)
                return MCPToolResult(
                    content=response.text,
                    usage={"prompt_tokens": 100, "completion_tokens": 50},
                    cost_usd=0.001
                )
    """

    def __init__(
        self,
        name: str,
        telemetry_collector: Optional[Any] = None,
        event_recorder: Optional[Any] = None,
    ):
        """Initialize MCP server.

        Args:
            name: Tool name for logging/tracking
            telemetry_collector: Energy monitor collector. If None, runs without telemetry.
            event_recorder: EventRecorder for per-action tracking. If None, no events recorded.
        """
        self.name = name
        self.telemetry_collector = telemetry_collector
        self.event_recorder = event_recorder

    def execute(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute tool with automatic telemetry capture.

        Args:
            prompt: Input prompt/query for tool
            **params: Additional tool-specific parameters

        Returns:
            MCPToolResult with content, usage, cost, and telemetry samples
        """
        start_time = time.time()

        # Get model info for event recording (subclasses can override these)
        model_id = getattr(self, "model_path", None) or getattr(self, "model_name", self.name)
        model_alias = getattr(self, "model_name", self.name)
        backend = self._get_backend()

        # Record start event
        if self.event_recorder is not None:
            self.event_recorder.record(
                "submodel_call_start",
                model_id=model_id,
                model_alias=model_alias,
                backend=backend,
                tool_name=self.name,
            )

        # Execute with telemetry if available
        if HAS_TELEMETRY and self.telemetry_collector is not None and TelemetrySession is not None:
            with TelemetrySession(self.telemetry_collector) as session:
                result = self._execute_impl(prompt, **params)
                end_time = time.time()
                result.telemetry_samples = list(session.window(start_time, end_time))
                result.latency_seconds = end_time - start_time
        else:
            # Execute without telemetry
            result = self._execute_impl(prompt, **params)
            end_time = time.time()
            result.latency_seconds = end_time - start_time

        # Record end event
        if self.event_recorder is not None:
            self.event_recorder.record(
                "submodel_call_end",
                model_id=model_id,
                model_alias=model_alias,
                backend=backend,
                tool_name=self.name,
                total_tokens=result.usage.get("total_tokens", 0),
                prompt_tokens=result.usage.get("prompt_tokens", 0),
                completion_tokens=result.usage.get("completion_tokens", 0),
                cost_usd=result.cost_usd,
                latency_seconds=result.latency_seconds,
            )

        return result

    def _get_backend(self) -> str:
        """Get the backend type for this server.

        Returns:
            Backend identifier (e.g., 'vllm', 'ollama', 'openai')
        """
        # Default: extract from name if it contains a prefix like "vllm:" or "ollama:"
        if ":" in self.name:
            return self.name.split(":")[0]
        return "unknown"

    @abstractmethod
    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Implement tool execution logic.

        Subclasses must override this to provide actual tool functionality.

        Args:
            prompt: Input prompt/query
            **params: Tool-specific parameters

        Returns:
            MCPToolResult (telemetry_samples and latency_seconds will be added by base class)
        """
        raise NotImplementedError

    def health_check(self) -> bool:
        """Check if tool is available and healthy.

        Returns:
            True if tool is operational, False otherwise
        """
        try:
            result = self.execute("test", timeout=5)
            return result.content is not None
        except Exception:
            return False

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(name={self.name!r})"

`init(name, telemetry_collector=None, event_recorder=None)` ¶

Initialize MCP server.

Parameters:

Name	Type	Description	Default
`name`	`str`	Tool name for logging/tracking	required
`telemetry_collector`	`Optional[Any]`	Energy monitor collector. If None, runs without telemetry.	`None`
`event_recorder`	`Optional[Any]`	EventRecorder for per-action tracking. If None, no events recorded.	`None`

Source code in intelligence-per-watt/src/ipw/agents/mcp/base.py

def __init__(
    self,
    name: str,
    telemetry_collector: Optional[Any] = None,
    event_recorder: Optional[Any] = None,
):
    """Initialize MCP server.

    Args:
        name: Tool name for logging/tracking
        telemetry_collector: Energy monitor collector. If None, runs without telemetry.
        event_recorder: EventRecorder for per-action tracking. If None, no events recorded.
    """
    self.name = name
    self.telemetry_collector = telemetry_collector
    self.event_recorder = event_recorder

`execute(prompt, **params)` ¶

Execute tool with automatic telemetry capture.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	Input prompt/query for tool	required
`**params`	`Any`	Additional tool-specific parameters	`{}`

Returns:

Type	Description
`MCPToolResult`	MCPToolResult with content, usage, cost, and telemetry samples

Source code in intelligence-per-watt/src/ipw/agents/mcp/base.py

def execute(self, prompt: str, **params: Any) -> MCPToolResult:
    """Execute tool with automatic telemetry capture.

    Args:
        prompt: Input prompt/query for tool
        **params: Additional tool-specific parameters

    Returns:
        MCPToolResult with content, usage, cost, and telemetry samples
    """
    start_time = time.time()

    # Get model info for event recording (subclasses can override these)
    model_id = getattr(self, "model_path", None) or getattr(self, "model_name", self.name)
    model_alias = getattr(self, "model_name", self.name)
    backend = self._get_backend()

    # Record start event
    if self.event_recorder is not None:
        self.event_recorder.record(
            "submodel_call_start",
            model_id=model_id,
            model_alias=model_alias,
            backend=backend,
            tool_name=self.name,
        )

    # Execute with telemetry if available
    if HAS_TELEMETRY and self.telemetry_collector is not None and TelemetrySession is not None:
        with TelemetrySession(self.telemetry_collector) as session:
            result = self._execute_impl(prompt, **params)
            end_time = time.time()
            result.telemetry_samples = list(session.window(start_time, end_time))
            result.latency_seconds = end_time - start_time
    else:
        # Execute without telemetry
        result = self._execute_impl(prompt, **params)
        end_time = time.time()
        result.latency_seconds = end_time - start_time

    # Record end event
    if self.event_recorder is not None:
        self.event_recorder.record(
            "submodel_call_end",
            model_id=model_id,
            model_alias=model_alias,
            backend=backend,
            tool_name=self.name,
            total_tokens=result.usage.get("total_tokens", 0),
            prompt_tokens=result.usage.get("prompt_tokens", 0),
            completion_tokens=result.usage.get("completion_tokens", 0),
            cost_usd=result.cost_usd,
            latency_seconds=result.latency_seconds,
        )

    return result

`health_check()` ¶

Check if tool is available and healthy.

Returns:

Type	Description
`bool`	True if tool is operational, False otherwise

Source code in intelligence-per-watt/src/ipw/agents/mcp/base.py

def health_check(self) -> bool:
    """Check if tool is available and healthy.

    Returns:
        True if tool is operational, False otherwise
    """
    try:
        result = self.execute("test", timeout=5)
        return result.content is not None
    except Exception:
        return False

`MCPToolResult` `dataclass` ¶

Result from MCP tool execution with telemetry.

Source code in intelligence-per-watt/src/ipw/agents/mcp/base.py

@dataclass
class MCPToolResult:
    """Result from MCP tool execution with telemetry."""

    content: str
    """Response text from tool/model"""

    usage: Dict[str, int] = field(default_factory=dict)
    """Token counts: prompt_tokens, completion_tokens, total_tokens"""

    cost_usd: Optional[float] = None
    """API cost in USD (for cloud APIs)"""

    telemetry_samples: List[Any] = field(default_factory=list)
    """Energy/power/memory readings during execution"""

    latency_seconds: float = 0.0
    """Wall-clock execution time"""

    ttft_seconds: Optional[float] = None
    """Time to first token (for streaming APIs)"""

    metadata: Dict[str, Any] = field(default_factory=dict)
    """Additional tool-specific metadata"""

`content` `instance-attribute` ¶

Response text from tool/model

`usage = field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

Token counts: prompt_tokens, completion_tokens, total_tokens

`cost_usd = None` `class-attribute` `instance-attribute` ¶

API cost in USD (for cloud APIs)

`telemetry_samples = field(default_factory=list)` `class-attribute` `instance-attribute` ¶

Energy/power/memory readings during execution

`latency_seconds = 0.0` `class-attribute` `instance-attribute` ¶

Wall-clock execution time

`ttft_seconds = None` `class-attribute` `instance-attribute` ¶

Time to first token (for streaming APIs)

`metadata = field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

Additional tool-specific metadata

`OpenAIMCPServer` ¶

Bases: BaseMCPServer

MCP server for OpenAI models with automatic cost tracking.

Tracks API costs based on token usage and current pricing.

Example

server = OpenAIMCPServer( model_name="gpt-4o", api_key=os.getenv("OPENAI_API_KEY") )

result = server.execute("Explain quantum computing") print(result.content) print(f"Cost: ${result.cost_usd:.4f}")

Source code in intelligence-per-watt/src/ipw/agents/mcp/openai_server.py

class OpenAIMCPServer(BaseMCPServer):
    """MCP server for OpenAI models with automatic cost tracking.

    Tracks API costs based on token usage and current pricing.

    Example:
        server = OpenAIMCPServer(
            model_name="gpt-4o",
            api_key=os.getenv("OPENAI_API_KEY")
        )

        result = server.execute("Explain quantum computing")
        print(result.content)
        print(f"Cost: ${result.cost_usd:.4f}")
    """

    def __init__(
        self,
        model_name: str,
        api_key: Optional[str] = None,
        telemetry_collector: Optional[Any] = None,
        event_recorder: Optional[Any] = None,
        **openai_params: Any,
    ):
        """Initialize OpenAI MCP server.

        Args:
            model_name: OpenAI model name (e.g., "gpt-4o", "gpt-5-mini-2025-08-07")
            api_key: OpenAI API key (or set OPENAI_API_KEY env var)
            telemetry_collector: Energy monitor collector
            event_recorder: EventRecorder for per-action tracking
            **openai_params: Additional OpenAI parameters (temperature, max_tokens, etc.)
        """
        super().__init__(
            name=f"openai:{model_name}",
            telemetry_collector=telemetry_collector,
            event_recorder=event_recorder,
        )

        self.model_name = model_name
        self.openai_params = openai_params

        # Lazy import: openai is optional
        try:
            from openai import OpenAI
        except ImportError:
            raise ImportError(
                "openai package is required for OpenAIMCPServer. "
                "Install with: pip install openai"
            )

        # Initialize OpenAI client
        if api_key is None:
            api_key = os.getenv("OPENAI_API_KEY")
        self._client = OpenAI(api_key=api_key)

        # Get pricing for this model
        self.pricing = OPENAI_PRICING.get(model_name)
        if not self.pricing:
            print(f"Warning: No pricing info for {model_name}, using gpt-4o rates")
            self.pricing = OPENAI_PRICING["gpt-4o"]

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute OpenAI API call with cost tracking."""
        from openai import OpenAIError

        # Merge default params with request params
        payload = {**self.openai_params, **params}
        payload["model"] = self.model_name
        payload["messages"] = [{"role": "user", "content": prompt}]
        payload["stream"] = True

        # GPT-5+ models use max_completion_tokens instead of max_tokens
        if "max_tokens" in payload and self.model_name.startswith("gpt-5"):
            payload["max_completion_tokens"] = payload.pop("max_tokens")

        # Call OpenAI API
        start = time.perf_counter()
        try:
            stream = self._client.chat.completions.create(**payload)
        except OpenAIError as exc:
            raise RuntimeError(f"OpenAI error for {self.model_name}: {exc}") from exc

        # Consume stream and collect response
        content_chunks: list[str] = []
        ttft_ms: Optional[float] = None
        usage = None

        for chunk in stream:
            if chunk.choices and len(chunk.choices) > 0:
                delta = chunk.choices[0].delta
                if delta.content:
                    if ttft_ms is None:
                        ttft_ms = (time.perf_counter() - start) * 1000
                    content_chunks.append(delta.content)

            # Last chunk contains usage
            if hasattr(chunk, "usage") and chunk.usage:
                usage = chunk.usage

        content = "".join(content_chunks)

        # Extract token counts
        if usage:
            prompt_tokens = usage.prompt_tokens
            completion_tokens = usage.completion_tokens
            total_tokens = usage.total_tokens
        else:
            prompt_tokens = len(prompt.split())
            completion_tokens = len(content.split())
            total_tokens = prompt_tokens + completion_tokens

        # Calculate cost
        cost_usd = calculate_cost("openai", self.model_name, prompt_tokens, completion_tokens)

        return MCPToolResult(
            content=content,
            usage={
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": total_tokens,
            },
            cost_usd=cost_usd,
            ttft_seconds=(ttft_ms / 1000.0) if ttft_ms else None,
            metadata={
                "model": self.model_name,
                "backend": "openai",
                "pricing_input_per_1m": self.pricing["input"],
                "pricing_output_per_1m": self.pricing["output"],
            },
        )

    def health_check(self) -> bool:
        """Check if OpenAI API is accessible."""
        try:
            response = self._client.chat.completions.create(
                model=self.model_name,
                messages=[{"role": "user", "content": "test"}],
                max_tokens=1,
            )
            return response is not None
        except Exception:
            return False

    def list_available_models(self) -> list[str]:
        """List all available OpenAI models."""
        from openai import OpenAIError
        try:
            models = self._client.models.list()
            return [model.id for model in models.data if model.id.startswith("gpt")]
        except OpenAIError as exc:
            raise RuntimeError(f"Failed to list OpenAI models: {exc}") from exc

`init(model_name, api_key=None, telemetry_collector=None, event_recorder=None, **openai_params)` ¶

Initialize OpenAI MCP server.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	OpenAI model name (e.g., "gpt-4o", "gpt-5-mini-2025-08-07")	required
`api_key`	`Optional[str]`	OpenAI API key (or set OPENAI_API_KEY env var)	`None`
`telemetry_collector`	`Optional[Any]`	Energy monitor collector	`None`
`event_recorder`	`Optional[Any]`	EventRecorder for per-action tracking	`None`
`**openai_params`	`Any`	Additional OpenAI parameters (temperature, max_tokens, etc.)	`{}`

Source code in intelligence-per-watt/src/ipw/agents/mcp/openai_server.py

def __init__(
    self,
    model_name: str,
    api_key: Optional[str] = None,
    telemetry_collector: Optional[Any] = None,
    event_recorder: Optional[Any] = None,
    **openai_params: Any,
):
    """Initialize OpenAI MCP server.

    Args:
        model_name: OpenAI model name (e.g., "gpt-4o", "gpt-5-mini-2025-08-07")
        api_key: OpenAI API key (or set OPENAI_API_KEY env var)
        telemetry_collector: Energy monitor collector
        event_recorder: EventRecorder for per-action tracking
        **openai_params: Additional OpenAI parameters (temperature, max_tokens, etc.)
    """
    super().__init__(
        name=f"openai:{model_name}",
        telemetry_collector=telemetry_collector,
        event_recorder=event_recorder,
    )

    self.model_name = model_name
    self.openai_params = openai_params

    # Lazy import: openai is optional
    try:
        from openai import OpenAI
    except ImportError:
        raise ImportError(
            "openai package is required for OpenAIMCPServer. "
            "Install with: pip install openai"
        )

    # Initialize OpenAI client
    if api_key is None:
        api_key = os.getenv("OPENAI_API_KEY")
    self._client = OpenAI(api_key=api_key)

    # Get pricing for this model
    self.pricing = OPENAI_PRICING.get(model_name)
    if not self.pricing:
        print(f"Warning: No pricing info for {model_name}, using gpt-4o rates")
        self.pricing = OPENAI_PRICING["gpt-4o"]

`health_check()` ¶

Check if OpenAI API is accessible.

Source code in intelligence-per-watt/src/ipw/agents/mcp/openai_server.py

def health_check(self) -> bool:
    """Check if OpenAI API is accessible."""
    try:
        response = self._client.chat.completions.create(
            model=self.model_name,
            messages=[{"role": "user", "content": "test"}],
            max_tokens=1,
        )
        return response is not None
    except Exception:
        return False

`list_available_models()` ¶

List all available OpenAI models.

Source code in intelligence-per-watt/src/ipw/agents/mcp/openai_server.py

def list_available_models(self) -> list[str]:
    """List all available OpenAI models."""
    from openai import OpenAIError
    try:
        models = self._client.models.list()
        return [model.id for model in models.data if model.id.startswith("gpt")]
    except OpenAIError as exc:
        raise RuntimeError(f"Failed to list OpenAI models: {exc}") from exc

`AnthropicMCPServer` ¶

Bases: BaseMCPServer

MCP server for Anthropic Claude models with cost tracking.

Example

server = AnthropicMCPServer( model_name="claude-sonnet-4-5-20250929", api_key=os.getenv("ANTHROPIC_API_KEY") )

result = server.execute("Write a haiku about AI") print(result.content) print(f"Cost: ${result.cost_usd:.4f}")

Source code in intelligence-per-watt/src/ipw/agents/mcp/anthropic_server.py

class AnthropicMCPServer(BaseMCPServer):
    """MCP server for Anthropic Claude models with cost tracking.

    Example:
        server = AnthropicMCPServer(
            model_name="claude-sonnet-4-5-20250929",
            api_key=os.getenv("ANTHROPIC_API_KEY")
        )

        result = server.execute("Write a haiku about AI")
        print(result.content)
        print(f"Cost: ${result.cost_usd:.4f}")
    """

    def __init__(
        self,
        model_name: str,
        api_key: Optional[str] = None,
        telemetry_collector: Optional[Any] = None,
        **anthropic_params: Any,
    ):
        """Initialize Anthropic MCP server.

        Args:
            model_name: Claude model name
            api_key: Anthropic API key (or set ANTHROPIC_API_KEY env var)
            telemetry_collector: Energy monitor collector
            **anthropic_params: Additional params (temperature, max_tokens, etc.)
        """
        super().__init__(
            name=f"anthropic:{model_name}",
            telemetry_collector=telemetry_collector,
        )

        self.model_name = model_name
        self.anthropic_params = anthropic_params

        # Lazy import: anthropic is optional
        try:
            from anthropic import Anthropic
        except ImportError:
            raise ImportError(
                "anthropic package is required for AnthropicMCPServer. "
                "Install with: pip install anthropic"
            )

        # Initialize Anthropic client
        self._client = Anthropic(api_key=api_key)

        # Get pricing for this model
        self.pricing = ANTHROPIC_PRICING.get(model_name)
        if not self.pricing:
            print(f"Warning: No pricing info for {model_name}, using Sonnet 4.5 rates")
            self.pricing = ANTHROPIC_PRICING["claude-sonnet-4-5-20250929"]

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute Anthropic API call with cost tracking."""
        from anthropic import AnthropicError

        # Merge default params with request params
        payload = {**self.anthropic_params, **params}
        payload["model"] = self.model_name
        payload["messages"] = [{"role": "user", "content": prompt}]
        payload["max_tokens"] = payload.get("max_tokens", 4096)
        payload["stream"] = True

        # Call Anthropic API
        start = time.perf_counter()
        try:
            stream = self._client.messages.create(**payload)
        except AnthropicError as exc:
            raise RuntimeError(f"Anthropic error for {self.model_name}: {exc}") from exc

        # Consume stream and collect response
        content_chunks: list[str] = []
        ttft_ms: Optional[float] = None
        input_tokens = 0
        output_tokens = 0

        with stream as event_stream:
            for event in event_stream:
                if event.type == "content_block_delta":
                    if hasattr(event.delta, "text"):
                        if ttft_ms is None:
                            ttft_ms = (time.perf_counter() - start) * 1000
                        content_chunks.append(event.delta.text)

                elif event.type == "message_start":
                    if hasattr(event.message, "usage"):
                        input_tokens = event.message.usage.input_tokens

                elif event.type == "message_delta":
                    if hasattr(event, "usage"):
                        output_tokens = event.usage.output_tokens

        content = "".join(content_chunks)

        # Calculate cost
        cost_usd = calculate_cost("anthropic", self.model_name, input_tokens, output_tokens)

        return MCPToolResult(
            content=content,
            usage={
                "prompt_tokens": input_tokens,
                "completion_tokens": output_tokens,
                "total_tokens": input_tokens + output_tokens,
            },
            cost_usd=cost_usd,
            ttft_seconds=(ttft_ms / 1000.0) if ttft_ms else None,
            metadata={
                "model": self.model_name,
                "backend": "anthropic",
                "pricing_input_per_1m": self.pricing["input"],
                "pricing_output_per_1m": self.pricing["output"],
            },
        )

    def health_check(self) -> bool:
        """Check if Anthropic API is accessible."""
        try:
            response = self._client.messages.create(
                model=self.model_name,
                messages=[{"role": "user", "content": "test"}],
                max_tokens=1,
            )
            return response is not None
        except Exception:
            return False

`init(model_name, api_key=None, telemetry_collector=None, **anthropic_params)` ¶

Initialize Anthropic MCP server.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Claude model name	required
`api_key`	`Optional[str]`	Anthropic API key (or set ANTHROPIC_API_KEY env var)	`None`
`telemetry_collector`	`Optional[Any]`	Energy monitor collector	`None`
`**anthropic_params`	`Any`	Additional params (temperature, max_tokens, etc.)	`{}`

Source code in intelligence-per-watt/src/ipw/agents/mcp/anthropic_server.py

def __init__(
    self,
    model_name: str,
    api_key: Optional[str] = None,
    telemetry_collector: Optional[Any] = None,
    **anthropic_params: Any,
):
    """Initialize Anthropic MCP server.

    Args:
        model_name: Claude model name
        api_key: Anthropic API key (or set ANTHROPIC_API_KEY env var)
        telemetry_collector: Energy monitor collector
        **anthropic_params: Additional params (temperature, max_tokens, etc.)
    """
    super().__init__(
        name=f"anthropic:{model_name}",
        telemetry_collector=telemetry_collector,
    )

    self.model_name = model_name
    self.anthropic_params = anthropic_params

    # Lazy import: anthropic is optional
    try:
        from anthropic import Anthropic
    except ImportError:
        raise ImportError(
            "anthropic package is required for AnthropicMCPServer. "
            "Install with: pip install anthropic"
        )

    # Initialize Anthropic client
    self._client = Anthropic(api_key=api_key)

    # Get pricing for this model
    self.pricing = ANTHROPIC_PRICING.get(model_name)
    if not self.pricing:
        print(f"Warning: No pricing info for {model_name}, using Sonnet 4.5 rates")
        self.pricing = ANTHROPIC_PRICING["claude-sonnet-4-5-20250929"]

`health_check()` ¶

Check if Anthropic API is accessible.

Source code in intelligence-per-watt/src/ipw/agents/mcp/anthropic_server.py

def health_check(self) -> bool:
    """Check if Anthropic API is accessible."""
    try:
        response = self._client.messages.create(
            model=self.model_name,
            messages=[{"role": "user", "content": "test"}],
            max_tokens=1,
        )
        return response is not None
    except Exception:
        return False

`OpenRouterMCPServer` ¶

Bases: BaseMCPServer

MCP server for OpenRouter with automatic cost tracking.

OpenRouter provides a unified API for accessing many LLM providers. Uses OpenAI-compatible API format.

Example

server = OpenRouterMCPServer( model_name="google/gemini-2.5-flash", api_key=os.getenv("OPENROUTER_API_KEY") )

result = server.execute("Explain quantum computing") print(result.content) print(f"Cost: ${result.cost_usd:.6f}")

Source code in intelligence-per-watt/src/ipw/agents/mcp/openrouter_server.py

class OpenRouterMCPServer(BaseMCPServer):
    """MCP server for OpenRouter with automatic cost tracking.

    OpenRouter provides a unified API for accessing many LLM providers.
    Uses OpenAI-compatible API format.

    Example:
        server = OpenRouterMCPServer(
            model_name="google/gemini-2.5-flash",
            api_key=os.getenv("OPENROUTER_API_KEY")
        )

        result = server.execute("Explain quantum computing")
        print(result.content)
        print(f"Cost: ${result.cost_usd:.6f}")
    """

    # Approximate pricing per 1M tokens (varies by model)
    # Source: https://openrouter.ai/models
    PRICING = {
        # Meta Llama
        "meta-llama/llama-3.3-70b-instruct": {"input": 0.40, "output": 0.40},
        "meta-llama/llama-3.1-405b-instruct": {"input": 2.00, "output": 2.00},
        "meta-llama/llama-3.1-70b-instruct": {"input": 0.40, "output": 0.40},
        "meta-llama/llama-3.1-8b-instruct": {"input": 0.05, "output": 0.05},
        # Qwen - General
        "qwen/qwen-2.5-72b-instruct": {"input": 0.35, "output": 0.40},
        "qwen/qwen-2.5-32b-instruct": {"input": 0.15, "output": 0.15},
        "qwen/qwq-32b": {"input": 0.15, "output": 0.15},
        "qwen/qwen-2.5-coder-32b-instruct": {"input": 0.15, "output": 0.15},
        # Qwen3
        "qwen/qwen3-32b": {"input": 0.08, "output": 0.24},
        "qwen/qwen3-coder-next": {"input": 0.20, "output": 1.50},
        "qwen/qwen3-coder-plus": {"input": 1.00, "output": 5.00},
        "qwen/qwen3-max": {"input": 1.20, "output": 6.00},
        "qwen/qwen3-next-80b-a3b-instruct": {"input": 0.09, "output": 1.10},
        # Math specialist
        "z-ai/glm-4.7": {"input": 0.40, "output": 1.50},
        "qwen/qwen2.5-math-72b-instruct": {"input": 0.40, "output": 0.40},
        "qwen/qwen2.5-coder-32b-instruct": {"input": 0.15, "output": 0.15},
        # DeepSeek
        "deepseek/deepseek-r1": {"input": 0.70, "output": 2.50},
        "deepseek/deepseek-r1-0528": {"input": 0.40, "output": 1.75},
        "deepseek/deepseek-v3.2": {"input": 0.25, "output": 0.38},
        "deepseek/deepseek-v3.1-terminus": {"input": 0.21, "output": 0.79},
        "deepseek/deepseek-chat-v3-0324": {"input": 0.14, "output": 0.28},
        "deepseek/deepseek-chat": {"input": 0.14, "output": 0.28},
        "deepseek/deepseek-coder-v2": {"input": 0.14, "output": 0.28},
        # Mistral
        "mistralai/mistral-large-2411": {"input": 2.00, "output": 6.00},
        "mistralai/mistral-small-2501": {"input": 0.10, "output": 0.30},
        "mistralai/codestral-2501": {"input": 0.30, "output": 0.90},
        # Google Gemini (via OpenRouter)
        "google/gemini-2.5-flash": {"input": 0.30, "output": 2.50},
        "google/gemini-2.5-pro": {"input": 1.25, "output": 5.00},
        "google/gemini-3-flash-preview": {"input": 0.50, "output": 3.00},
        "z-ai/glm-4.7-flash": {"input": 0.07, "output": 0.40},
        # Small/cheap models
        "openai/gpt-oss-20b": {"input": 0.02, "output": 0.10},
        "deepseek/deepseek-r1-distill-qwen-1.5b": {"input": 0.02, "output": 0.05},
        "google/gemma-3-4b-it": {"input": 0.02, "output": 0.07},
    }

    # Default fallback pricing
    DEFAULT_PRICING = {"input": 1.00, "output": 3.00}

    def __init__(
        self,
        model_name: str,
        api_key: Optional[str] = None,
        telemetry_collector: Optional[Any] = None,
        site_url: Optional[str] = None,
        app_name: Optional[str] = None,
        **openai_params: Any,
    ):
        """Initialize OpenRouter MCP server.

        Args:
            model_name: Model identifier (e.g., "google/gemini-2.5-flash")
            api_key: OpenRouter API key (or set OPENROUTER_API_KEY env var)
            telemetry_collector: Energy monitor collector
            site_url: Your site URL for OpenRouter rankings
            app_name: Your app name for OpenRouter rankings
            **openai_params: Additional parameters (temperature, max_tokens, etc.)
        """
        super().__init__(
            name=f"openrouter:{model_name}",
            telemetry_collector=telemetry_collector,
        )

        self.model_name = model_name
        self.openai_params = openai_params

        # Get API key
        api_key = api_key or os.environ.get("OPENROUTER_API_KEY")
        if not api_key:
            raise ValueError(
                "OpenRouter API key required. Set OPENROUTER_API_KEY env var "
                "or pass api_key parameter."
            )

        # Lazy import: openai is optional
        try:
            from openai import OpenAI
        except ImportError:
            raise ImportError(
                "openai package is required for OpenRouterMCPServer. "
                "Install with: pip install openai"
            )

        # Initialize OpenAI-compatible client with OpenRouter base URL
        self._client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=api_key,
            default_headers={
                "HTTP-Referer": site_url or "https://github.com/ipw",
                "X-Title": app_name or "IPW Orchestrator",
            },
        )

        # Get pricing for this model
        self.pricing = self.PRICING.get(model_name, self.DEFAULT_PRICING)
        if model_name not in self.PRICING:
            print(f"Warning: No pricing info for {model_name}, using default rates")

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute OpenRouter API call with cost tracking."""
        from openai import OpenAIError

        # Merge default params with request params
        payload = {**self.openai_params, **params}
        payload["model"] = self.model_name
        payload["messages"] = [{"role": "user", "content": prompt}]
        payload["stream"] = True

        # Call OpenRouter API
        start = time.perf_counter()
        try:
            stream = self._client.chat.completions.create(**payload)
        except OpenAIError as exc:
            raise RuntimeError(f"OpenRouter error for {self.model_name}: {exc}") from exc

        # Consume stream and collect response
        content_chunks: list[str] = []
        ttft_ms: Optional[float] = None
        usage = None

        for chunk in stream:
            if chunk.choices and len(chunk.choices) > 0:
                delta = chunk.choices[0].delta
                if delta.content:
                    if ttft_ms is None:
                        ttft_ms = (time.perf_counter() - start) * 1000
                    content_chunks.append(delta.content)

            # Last chunk may contain usage
            if hasattr(chunk, "usage") and chunk.usage:
                usage = chunk.usage

        content = "".join(content_chunks)

        # Extract token counts
        if usage:
            prompt_tokens = usage.prompt_tokens
            completion_tokens = usage.completion_tokens
            total_tokens = usage.total_tokens
        else:
            prompt_tokens = int(len(prompt.split()) * 1.3)
            completion_tokens = int(len(content.split()) * 1.3)
            total_tokens = prompt_tokens + completion_tokens

        # Calculate cost based on token usage
        cost_usd = self._calculate_cost(prompt_tokens, completion_tokens)

        return MCPToolResult(
            content=content,
            usage={
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": total_tokens,
            },
            cost_usd=cost_usd,
            ttft_seconds=(ttft_ms / 1000.0) if ttft_ms else None,
            metadata={
                "model": self.model_name,
                "backend": "openrouter",
                "pricing_input_per_1m": self.pricing["input"],
                "pricing_output_per_1m": self.pricing["output"],
            },
        )

    def _calculate_cost(self, prompt_tokens: int, completion_tokens: int) -> float:
        """Calculate API cost in USD."""
        input_cost = (prompt_tokens / 1_000_000) * self.pricing["input"]
        output_cost = (completion_tokens / 1_000_000) * self.pricing["output"]
        return input_cost + output_cost

    def health_check(self) -> bool:
        """Check if OpenRouter API is accessible."""
        try:
            response = self._client.chat.completions.create(
                model=self.model_name,
                messages=[{"role": "user", "content": "test"}],
                max_tokens=1,
            )
            return response is not None
        except Exception:
            return False

    @classmethod
    def list_popular_models(cls) -> list[str]:
        """List popular models available on OpenRouter."""
        return list(cls.PRICING.keys())

`init(model_name, api_key=None, telemetry_collector=None, site_url=None, app_name=None, **openai_params)` ¶

Initialize OpenRouter MCP server.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Model identifier (e.g., "google/gemini-2.5-flash")	required
`api_key`	`Optional[str]`	OpenRouter API key (or set OPENROUTER_API_KEY env var)	`None`
`telemetry_collector`	`Optional[Any]`	Energy monitor collector	`None`
`site_url`	`Optional[str]`	Your site URL for OpenRouter rankings	`None`
`app_name`	`Optional[str]`	Your app name for OpenRouter rankings	`None`
`**openai_params`	`Any`	Additional parameters (temperature, max_tokens, etc.)	`{}`

Source code in intelligence-per-watt/src/ipw/agents/mcp/openrouter_server.py

def __init__(
    self,
    model_name: str,
    api_key: Optional[str] = None,
    telemetry_collector: Optional[Any] = None,
    site_url: Optional[str] = None,
    app_name: Optional[str] = None,
    **openai_params: Any,
):
    """Initialize OpenRouter MCP server.

    Args:
        model_name: Model identifier (e.g., "google/gemini-2.5-flash")
        api_key: OpenRouter API key (or set OPENROUTER_API_KEY env var)
        telemetry_collector: Energy monitor collector
        site_url: Your site URL for OpenRouter rankings
        app_name: Your app name for OpenRouter rankings
        **openai_params: Additional parameters (temperature, max_tokens, etc.)
    """
    super().__init__(
        name=f"openrouter:{model_name}",
        telemetry_collector=telemetry_collector,
    )

    self.model_name = model_name
    self.openai_params = openai_params

    # Get API key
    api_key = api_key or os.environ.get("OPENROUTER_API_KEY")
    if not api_key:
        raise ValueError(
            "OpenRouter API key required. Set OPENROUTER_API_KEY env var "
            "or pass api_key parameter."
        )

    # Lazy import: openai is optional
    try:
        from openai import OpenAI
    except ImportError:
        raise ImportError(
            "openai package is required for OpenRouterMCPServer. "
            "Install with: pip install openai"
        )

    # Initialize OpenAI-compatible client with OpenRouter base URL
    self._client = OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key=api_key,
        default_headers={
            "HTTP-Referer": site_url or "https://github.com/ipw",
            "X-Title": app_name or "IPW Orchestrator",
        },
    )

    # Get pricing for this model
    self.pricing = self.PRICING.get(model_name, self.DEFAULT_PRICING)
    if model_name not in self.PRICING:
        print(f"Warning: No pricing info for {model_name}, using default rates")

`health_check()` ¶

Check if OpenRouter API is accessible.

Source code in intelligence-per-watt/src/ipw/agents/mcp/openrouter_server.py

def health_check(self) -> bool:
    """Check if OpenRouter API is accessible."""
    try:
        response = self._client.chat.completions.create(
            model=self.model_name,
            messages=[{"role": "user", "content": "test"}],
            max_tokens=1,
        )
        return response is not None
    except Exception:
        return False

`list_popular_models()` `classmethod` ¶

List popular models available on OpenRouter.

Source code in intelligence-per-watt/src/ipw/agents/mcp/openrouter_server.py

@classmethod
def list_popular_models(cls) -> list[str]:
    """List popular models available on OpenRouter."""
    return list(cls.PRICING.keys())

`VLLMMCPServer` ¶

Bases: BaseMCPServer

MCP server for vLLM-served models.

vLLM provides an OpenAI-compatible API for serving large open-source models with optimizations like PagedAttention, continuous batching, and tensor parallelism.

Supported model categories: - General: Qwen3-32B, Qwen3-8B, Llama-3.3-70B-Instruct - Math specialist: Qwen2.5-Math-72B, Qwen2.5-Math-7B - Code specialist: Qwen2.5-Coder-32B, DeepSeek-Coder-V2

Example

Start vLLM server externally:¶

vllm serve Qwen/Qwen3-32B --tensor-parallel-size 4 --port 8000¶

server = VLLMMCPServer(model_name="qwen3-32b") result = server.execute("Explain quantum computing")

Source code in intelligence-per-watt/src/ipw/agents/mcp/vllm_server.py

class VLLMMCPServer(BaseMCPServer):
    """MCP server for vLLM-served models.

    vLLM provides an OpenAI-compatible API for serving large open-source
    models with optimizations like PagedAttention, continuous batching,
    and tensor parallelism.

    Supported model categories:
    - General: Qwen3-32B, Qwen3-8B, Llama-3.3-70B-Instruct
    - Math specialist: Qwen2.5-Math-72B, Qwen2.5-Math-7B
    - Code specialist: Qwen2.5-Coder-32B, DeepSeek-Coder-V2

    Example:
        # Start vLLM server externally:
        # vllm serve Qwen/Qwen3-32B --tensor-parallel-size 4 --port 8000

        server = VLLMMCPServer(model_name="qwen3-32b")
        result = server.execute("Explain quantum computing")
    """

    # Model name aliases to full HuggingFace paths
    SUPPORTED_MODELS: Dict[str, str] = {
        # General purpose
        "qwen3-32b": "Qwen/Qwen3-32B",
        "qwen3-8b": "Qwen/Qwen3-8B",
        "llama-70b": "meta-llama/Llama-3.3-70B-Instruct",
        "llama-8b": "meta-llama/Llama-3.1-8B-Instruct",
        # Math specialists
        "glm-4.7": "THUDM/glm-4-9b-chat",
        "qwen-math-7b": "Qwen/Qwen2.5-Math-7B-Instruct",
        "qwen-math-1.5b": "Qwen/Qwen2.5-Math-1.5B-Instruct",
        # Code specialists
        "qwen3-coder-plus": "Qwen/Qwen3-Coder-Plus",
        "qwen-coder-7b": "Qwen/Qwen2.5-Coder-7B-Instruct",
        # MoE models
        "glm-4.7-flash": "zai-org/GLM-4.7-Flash",
    }

    # Estimated costs per 1M tokens (local compute, GPU rental approximation)
    MODEL_COSTS: Dict[str, Dict[str, float]] = {
        "qwen3-32b": {"prompt": 0.50, "completion": 0.50},
        "qwen3-8b": {"prompt": 0.10, "completion": 0.10},
        "llama-70b": {"prompt": 1.00, "completion": 1.00},
        "llama-8b": {"prompt": 0.10, "completion": 0.10},
        "glm-4.7": {"prompt": 1.00, "completion": 1.00},
        "qwen-math-7b": {"prompt": 0.10, "completion": 0.10},
        "qwen-math-1.5b": {"prompt": 0.02, "completion": 0.02},
        "qwen3-coder-plus": {"prompt": 0.50, "completion": 0.50},
        "qwen-coder-7b": {"prompt": 0.10, "completion": 0.10},
        "glm-4.7-flash": {"prompt": 0.30, "completion": 0.30},
    }

    def __init__(
        self,
        model_name: str,
        vllm_url: str = "http://localhost:8000",
        api_key: Optional[str] = None,
        telemetry_collector: Optional[Any] = None,
        event_recorder: Optional[Any] = None,
        **vllm_params: Any,
    ):
        """Initialize vLLM server connection.

        Args:
            model_name: Model alias (e.g., 'qwen3-32b') or full HF path
            vllm_url: URL of the vLLM server (default: localhost:8000)
            api_key: Optional API key for authenticated endpoints
            telemetry_collector: Energy monitor collector
            event_recorder: EventRecorder for per-action tracking
            **vllm_params: Default parameters (max_tokens, temperature, top_p, etc.)
        """
        super().__init__(
            name=f"vllm:{model_name}",
            telemetry_collector=telemetry_collector,
            event_recorder=event_recorder,
        )

        self.model_name = model_name
        self.model_path = self.SUPPORTED_MODELS.get(model_name, model_name)
        self.vllm_url = vllm_url.rstrip("/")
        self.api_key = api_key or os.environ.get("VLLM_API_KEY")
        self.vllm_params = vllm_params

        # Cost estimation (per 1M tokens)
        self.cost_per_1m = self.MODEL_COSTS.get(
            model_name,
            {"prompt": 0.0, "completion": 0.0}
        )

        # Query server's actual max_model_len to handle validation
        self._server_max_model_len: Optional[int] = None

    def _get_server_max_model_len(self) -> Optional[int]:
        """Query server's actual max_model_len from /v1/models endpoint."""
        if self._server_max_model_len is not None:
            return self._server_max_model_len

        try:
            with httpx.Client(timeout=5.0) as client:
                response = client.get(f"{self.vllm_url}/v1/models")
                if response.status_code == 200:
                    models = response.json().get("data", [])
                    for model in models:
                        if model.get("id") == self.model_path or model.get("id") == self.model_name:
                            self._server_max_model_len = model.get("max_model_len")
                            return self._server_max_model_len
        except Exception:
            pass
        return None

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute inference via vLLM's OpenAI-compatible API."""
        global _retry_warn_count

        # Merge default params with per-request params
        merged_params = {**self.vllm_params, **params}

        max_tokens = merged_params.get("max_tokens", 8192)
        temperature = merged_params.get("temperature", 0.7)
        top_p = merged_params.get("top_p", 0.9)
        system_prompt = merged_params.get("system_prompt")

        # Build messages
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": prompt})

        # Build request
        headers = {"Content-Type": "application/json"}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"

        original_max_tokens = max_tokens

        payload = {
            "model": self.model_path,
            "messages": messages,
            "max_tokens": max_tokens,
            "temperature": temperature,
            "top_p": top_p,
        }

        try:
            with httpx.Client(timeout=240.0) as client:
                response = client.post(
                    f"{self.vllm_url}/v1/chat/completions",
                    headers=headers,
                    json=payload,
                )
                response.raise_for_status()
                data = response.json()

            # Extract response
            content = data["choices"][0]["message"]["content"]
            usage = data.get("usage", {})

            prompt_tokens = usage.get("prompt_tokens", 0)
            completion_tokens = usage.get("completion_tokens", 0)

            # Calculate cost
            cost_usd = (
                (prompt_tokens / 1_000_000) * self.cost_per_1m["prompt"] +
                (completion_tokens / 1_000_000) * self.cost_per_1m["completion"]
            )

            return MCPToolResult(
                content=content,
                usage={
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "total_tokens": prompt_tokens + completion_tokens,
                },
                cost_usd=cost_usd,
                metadata={
                    "model": self.model_path,
                    "server": "vllm",
                    "finish_reason": data["choices"][0].get("finish_reason"),
                },
            )

        except httpx.ConnectError:
            return MCPToolResult(
                content=f"Error: Cannot connect to vLLM server at {self.vllm_url}. "
                        f"Please ensure vLLM is running with the model {self.model_path}.",
                usage={},
                cost_usd=0.0,
                metadata={"error": "connection_failed"},
            )
        except httpx.ReadTimeout:
            return MCPToolResult(
                content=f"Error: vLLM request timed out after 240s. "
                        f"The model may be overloaded or the response is too long.",
                usage={},
                cost_usd=0.0,
                metadata={"error": "timeout"},
            )
        except httpx.HTTPStatusError as e:
            # Handle max_tokens validation error - try with capped value
            error_text = e.response.text
            error_text_lower = error_text.lower()

            if e.response.status_code == 400 and ("max_tokens" in error_text_lower or "max_completion_tokens" in error_text_lower) and "too large" in error_text_lower:
                match = re.search(r"maximum context length is (\d+)", error_text_lower)
                input_match = re.search(r"your request has (\d+) input", error_text_lower)

                if match:
                    validation_limit = int(match.group(1))
                    actual_input_tokens = int(input_match.group(1)) if input_match else None

                    if actual_input_tokens:
                        capped_max_tokens = max(1, int(validation_limit - actual_input_tokens - 100))
                    else:
                        estimated_prompt_tokens = len(prompt.split()) * 1.3
                        capped_max_tokens = max(1, int(validation_limit - estimated_prompt_tokens - 100))

                    if capped_max_tokens < max_tokens and capped_max_tokens > 0:
                        _retry_warn_count += 1
                        if _retry_warn_count == 1 or _retry_warn_count % 100 == 0:
                            warnings.warn(
                                f"vLLM validation rejected max_tokens={max_tokens} "
                                f"(validation uses model config limit: {validation_limit}). "
                                f"Retrying with max_tokens={capped_max_tokens}. "
                                f"(Total retries: {_retry_warn_count})",
                                UserWarning,
                                stacklevel=2,
                            )

                        payload["max_tokens"] = capped_max_tokens
                        try:
                            with httpx.Client(timeout=240.0) as client:
                                retry_response = client.post(
                                    f"{self.vllm_url}/v1/chat/completions",
                                    headers=headers,
                                    json=payload,
                                )
                                retry_response.raise_for_status()
                                retry_data = retry_response.json()

                            content = retry_data["choices"][0]["message"]["content"]
                            usage = retry_data.get("usage", {})
                            prompt_tokens = usage.get("prompt_tokens", 0)
                            completion_tokens = usage.get("completion_tokens", 0)
                            cost_usd = (
                                (prompt_tokens / 1_000_000) * self.cost_per_1m["prompt"] +
                                (completion_tokens / 1_000_000) * self.cost_per_1m["completion"]
                            )
                            return MCPToolResult(
                                content=content,
                                usage={
                                    "prompt_tokens": prompt_tokens,
                                    "completion_tokens": completion_tokens,
                                    "total_tokens": prompt_tokens + completion_tokens,
                                },
                                cost_usd=cost_usd,
                                metadata={
                                    "model": self.model_path,
                                    "server": "vllm",
                                    "finish_reason": retry_data["choices"][0].get("finish_reason"),
                                    "max_tokens_capped": True,
                                    "original_max_tokens": original_max_tokens,
                                },
                            )
                        except httpx.HTTPStatusError as retry_e:
                            return MCPToolResult(
                                content=f"Error: vLLM server returned {retry_e.response.status_code}: {retry_e.response.text}",
                                usage={},
                                cost_usd=0.0,
                                metadata={"error": f"http_{retry_e.response.status_code}", "retry_failed": True},
                            )
                        except Exception as retry_e:
                            return MCPToolResult(
                                content=f"Error during retry: {type(retry_e).__name__}: {retry_e}",
                                usage={},
                                cost_usd=0.0,
                                metadata={"error": "retry_exception"},
                            )

            return MCPToolResult(
                content=f"Error: vLLM server returned {e.response.status_code}: {e.response.text}",
                usage={},
                cost_usd=0.0,
                metadata={"error": f"http_{e.response.status_code}"},
            )
        except Exception as e:
            return MCPToolResult(
                content=f"Error: {type(e).__name__}: {e}",
                usage={},
                cost_usd=0.0,
                metadata={"error": str(e)},
            )

    def health_check(self) -> bool:
        """Check if vLLM server is running and model is loaded."""
        try:
            with httpx.Client(timeout=5.0) as client:
                response = client.get(f"{self.vllm_url}/v1/models")
                if response.status_code == 200:
                    models = response.json().get("data", [])
                    model_ids = [m.get("id", "") for m in models]
                    return self.model_path in model_ids or self.model_name in model_ids
            return False
        except Exception:
            return False

    @classmethod
    def list_supported_models(cls) -> Dict[str, str]:
        """Return mapping of model aliases to HuggingFace paths."""
        return cls.SUPPORTED_MODELS.copy()

`init(model_name, vllm_url='http://localhost:8000', api_key=None, telemetry_collector=None, event_recorder=None, **vllm_params)` ¶

Initialize vLLM server connection.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Model alias (e.g., 'qwen3-32b') or full HF path	required
`vllm_url`	`str`	URL of the vLLM server (default: localhost:8000)	`'http://localhost:8000'`
`api_key`	`Optional[str]`	Optional API key for authenticated endpoints	`None`
`telemetry_collector`	`Optional[Any]`	Energy monitor collector	`None`
`event_recorder`	`Optional[Any]`	EventRecorder for per-action tracking	`None`
`**vllm_params`	`Any`	Default parameters (max_tokens, temperature, top_p, etc.)	`{}`

Source code in intelligence-per-watt/src/ipw/agents/mcp/vllm_server.py

def __init__(
    self,
    model_name: str,
    vllm_url: str = "http://localhost:8000",
    api_key: Optional[str] = None,
    telemetry_collector: Optional[Any] = None,
    event_recorder: Optional[Any] = None,
    **vllm_params: Any,
):
    """Initialize vLLM server connection.

    Args:
        model_name: Model alias (e.g., 'qwen3-32b') or full HF path
        vllm_url: URL of the vLLM server (default: localhost:8000)
        api_key: Optional API key for authenticated endpoints
        telemetry_collector: Energy monitor collector
        event_recorder: EventRecorder for per-action tracking
        **vllm_params: Default parameters (max_tokens, temperature, top_p, etc.)
    """
    super().__init__(
        name=f"vllm:{model_name}",
        telemetry_collector=telemetry_collector,
        event_recorder=event_recorder,
    )

    self.model_name = model_name
    self.model_path = self.SUPPORTED_MODELS.get(model_name, model_name)
    self.vllm_url = vllm_url.rstrip("/")
    self.api_key = api_key or os.environ.get("VLLM_API_KEY")
    self.vllm_params = vllm_params

    # Cost estimation (per 1M tokens)
    self.cost_per_1m = self.MODEL_COSTS.get(
        model_name,
        {"prompt": 0.0, "completion": 0.0}
    )

    # Query server's actual max_model_len to handle validation
    self._server_max_model_len: Optional[int] = None

`health_check()` ¶

Check if vLLM server is running and model is loaded.

Source code in intelligence-per-watt/src/ipw/agents/mcp/vllm_server.py

def health_check(self) -> bool:
    """Check if vLLM server is running and model is loaded."""
    try:
        with httpx.Client(timeout=5.0) as client:
            response = client.get(f"{self.vllm_url}/v1/models")
            if response.status_code == 200:
                models = response.json().get("data", [])
                model_ids = [m.get("id", "") for m in models]
                return self.model_path in model_ids or self.model_name in model_ids
        return False
    except Exception:
        return False

`list_supported_models()` `classmethod` ¶

Return mapping of model aliases to HuggingFace paths.

Source code in intelligence-per-watt/src/ipw/agents/mcp/vllm_server.py

@classmethod
def list_supported_models(cls) -> Dict[str, str]:
    """Return mapping of model aliases to HuggingFace paths."""
    return cls.SUPPORTED_MODELS.copy()

`CalculatorServer` ¶

Bases: BaseMCPServer

MCP server for mathematical calculations.

Safely evaluates mathematical expressions using AST.

Example

calc = CalculatorServer() result = calc.execute("2 + 2 * 3") print(result.content) # "8"

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

class CalculatorServer(BaseMCPServer):
    """MCP server for mathematical calculations.

    Safely evaluates mathematical expressions using AST.

    Example:
        calc = CalculatorServer()
        result = calc.execute("2 + 2 * 3")
        print(result.content)  # "8"
    """

    # Safe operators for math evaluation
    OPERATORS = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.Pow: operator.pow,
        ast.Mod: operator.mod,
        ast.FloorDiv: operator.floordiv,
        ast.UAdd: operator.pos,
        ast.USub: operator.neg,
    }

    FUNCTIONS = {
        "sqrt": math.sqrt,
        "sin": math.sin,
        "cos": math.cos,
        "tan": math.tan,
        "log": math.log,
        "exp": math.exp,
        "abs": abs,
        "round": round,
    }

    def __init__(self, telemetry_collector: Optional[Any] = None):
        super().__init__(
            name="calculator",
            telemetry_collector=telemetry_collector,
        )

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute mathematical calculation.

        Args:
            prompt: Mathematical expression to evaluate

        Returns:
            MCPToolResult with calculated result
        """
        # Extract expression from prompt
        expression = self._extract_expression(prompt)

        try:
            result = self._safe_eval(expression)
            content = str(result)
            error = None
        except Exception as e:
            content = f"Error: {e}"
            error = str(e)

        return MCPToolResult(
            content=content,
            usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
            cost_usd=0.0,
            metadata={
                "tool": "calculator",
                "expression": expression,
                "error": error,
            },
        )

    def _extract_expression(self, prompt: str) -> str:
        """Extract mathematical expression from prompt."""
        # Look for expression in common formats
        patterns = [
            r"calculate\s+(.+)",
            r"compute\s+(.+)",
            r"evaluate\s+(.+)",
            r"what\s+is\s+(.+)",
        ]

        for pattern in patterns:
            match = re.search(pattern, prompt.lower())
            if match:
                expr = match.group(1).strip()
                # Strip trailing punctuation
                expr = expr.rstrip("?!.,;")
                return expr

        # If no pattern matches, assume entire prompt is the expression
        return prompt.strip().rstrip("?!.,;")

    def _safe_eval(self, expression: str) -> float:
        """Safely evaluate mathematical expression using AST.

        Args:
            expression: Mathematical expression string

        Returns:
            Evaluated result

        Raises:
            ValueError: If expression contains unsafe operations
        """
        # Pre-process: Convert common math notation to Python
        # Replace ^ with ** for exponentiation (careful not to replace ^^)
        expression = re.sub(r'\^(?!\^)', '**', expression)

        # Parse expression
        try:
            node = ast.parse(expression, mode="eval").body
        except SyntaxError as e:
            raise ValueError(f"Invalid expression: {e}")

        # Evaluate recursively
        return self._eval_node(node)

    def _eval_node(self, node: ast.AST) -> float:
        """Recursively evaluate AST node."""
        if isinstance(node, ast.Constant):
            return node.value
        elif isinstance(node, ast.BinOp):
            left = self._eval_node(node.left)
            right = self._eval_node(node.right)
            op = self.OPERATORS.get(type(node.op))
            if not op:
                raise ValueError(f"Unsupported operator: {node.op}")
            return op(left, right)
        elif isinstance(node, ast.UnaryOp):
            operand = self._eval_node(node.operand)
            op = self.OPERATORS.get(type(node.op))
            if not op:
                raise ValueError(f"Unsupported operator: {node.op}")
            return op(operand)
        elif isinstance(node, ast.Call):
            func_name = node.func.id if isinstance(node.func, ast.Name) else None
            if func_name not in self.FUNCTIONS:
                raise ValueError(f"Unsupported function: {func_name}")
            func = self.FUNCTIONS[func_name]
            args = [self._eval_node(arg) for arg in node.args]
            return func(*args)
        else:
            raise ValueError(f"Unsupported expression type: {type(node)}")

`WebSearchServer` ¶

Bases: BaseMCPServer

MCP server for web search via Tavily API.

Tavily provides high-quality, AI-optimized search results designed for LLM consumption with structured, relevant content.

Example

search = WebSearchServer(api_key="tvly-xxx") result = search.execute("latest AI news")

Cost: ~$0.01 per search (Tavily free tier: 1000 searches/month)

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

class WebSearchServer(BaseMCPServer):
    """MCP server for web search via Tavily API.

    Tavily provides high-quality, AI-optimized search results designed
    for LLM consumption with structured, relevant content.

    Example:
        search = WebSearchServer(api_key="tvly-xxx")
        result = search.execute("latest AI news")

    Cost: ~$0.01 per search (Tavily free tier: 1000 searches/month)
    """

    # Cost per search in USD
    COST_PER_SEARCH = 0.01

    def __init__(
        self,
        api_key: Optional[str] = None,
        telemetry_collector: Optional[Any] = None,
    ):
        super().__init__(
            name="web_search",
            telemetry_collector=telemetry_collector,
        )
        self.api_key = api_key or os.environ.get("TAVILY_API_KEY")
        self._client = None

    def _get_client(self):
        """Lazily initialize Tavily client."""
        if self._client is None:
            if not self.api_key:
                raise ValueError(
                    "TAVILY_API_KEY not set. Get a free API key at https://tavily.com"
                )
            from tavily import TavilyClient
            self._client = TavilyClient(api_key=self.api_key)

        return self._client

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute web search via Tavily API.

        Args:
            prompt: Search query
            **params: Additional parameters:
                - max_results: Number of results (default: 5)
                - search_depth: 'basic' or 'advanced' (default: 'basic')
                - include_answer: Include AI-generated answer (default: True)

        Returns:
            MCPToolResult with formatted search results
        """
        max_results = params.get("max_results", 5)
        search_depth = params.get("search_depth", "basic")
        include_answer = params.get("include_answer", True)

        # If no API key, return helpful message
        if not self.api_key:
            content = (
                f"[Web search for: {prompt}]\n\n"
                "Web search requires TAVILY_API_KEY environment variable.\n"
                "Get a free API key at: https://tavily.com\n"
                "Then set: export TAVILY_API_KEY='your-key'"
            )
            return MCPToolResult(
                content=content,
                usage={},
                cost_usd=0.0,
                metadata={"tool": "web_search", "error": "no_api_key"},
            )

        try:
            client = self._get_client()
            response = client.search(
                query=prompt,
                max_results=max_results,
                search_depth=search_depth,
                include_answer=include_answer,
            )

            # Format results
            lines = [f"Web search results for: {prompt}\n"]

            # Include AI-generated answer if available
            if include_answer and response.get("answer"):
                lines.append(f"Summary: {response['answer']}\n")

            # Format individual results
            results = response.get("results", [])
            for i, result in enumerate(results, 1):
                title = result.get("title", "No title")
                url = result.get("url", "")
                content_snippet = result.get("content", "")
                lines.append(f"{i}. {title}")
                lines.append(f"   URL: {url}")
                lines.append(f"   {content_snippet}")
                lines.append("")

            content = "\n".join(lines)

            return MCPToolResult(
                content=content,
                usage={},
                cost_usd=self.COST_PER_SEARCH,
                metadata={
                    "tool": "web_search",
                    "query": prompt,
                    "num_results": len(results),
                    "search_depth": search_depth,
                },
            )

        except ImportError as e:
            return MCPToolResult(
                content=f"Error: {e}",
                usage={},
                cost_usd=0.0,
                metadata={"tool": "web_search", "error": "import_error"},
            )
        except Exception as e:
            return MCPToolResult(
                content=f"Search error: {type(e).__name__}: {e}",
                usage={},
                cost_usd=0.0,
                metadata={"tool": "web_search", "error": str(e)},
            )

`CodeInterpreterServer` ¶

Bases: BaseMCPServer

MCP server for Python code execution with optional sandbox isolation.

Executes Python code in a subprocess with timeout protection. Supports bubblewrap (bwrap) for filesystem isolation on Linux.

Isolation modes

None: Direct subprocess execution (default, for compatibility)
"bubblewrap": Linux namespace isolation with read-only root fs
"auto": Use bubblewrap if available, fall back to direct execution

Example

interpreter = CodeInterpreterServer(isolation="auto") result = interpreter.execute("print([x**2 for x in range(10)])")

Cost: ~$0.0000083 per second of compute (based on cloud GPU rates)

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

class CodeInterpreterServer(BaseMCPServer):
    """MCP server for Python code execution with optional sandbox isolation.

    Executes Python code in a subprocess with timeout protection.
    Supports bubblewrap (bwrap) for filesystem isolation on Linux.

    Isolation modes:
        - None: Direct subprocess execution (default, for compatibility)
        - "bubblewrap": Linux namespace isolation with read-only root fs
        - "auto": Use bubblewrap if available, fall back to direct execution

    Example:
        interpreter = CodeInterpreterServer(isolation="auto")
        result = interpreter.execute("print([x**2 for x in range(10)])")

    Cost: ~$0.0000083 per second of compute (based on cloud GPU rates)
    """

    # Approximate cost per second of compute (GPU instance rate)
    COST_PER_SECOND = 0.0000083

    # Blocked imports for safety (used in non-isolated mode)
    BLOCKED_IMPORTS = {
        "os.system", "subprocess", "shutil.rmtree", "pathlib.Path.rmdir",
        "eval", "exec", "__import__", "importlib",
    }

    def __init__(
        self,
        timeout: int = 30,
        max_output_length: int = 10000,
        telemetry_collector: Optional[Any] = None,
        isolation: Optional[str] = None,
        allowed_paths: Optional[List[str]] = None,
    ):
        """Initialize code interpreter.

        Args:
            timeout: Maximum execution time in seconds (default: 30)
            max_output_length: Maximum characters to return (default: 10000)
            telemetry_collector: Energy monitor collector
            isolation: Isolation mode - None, "bubblewrap", or "auto"
            allowed_paths: Additional paths to mount read-only in sandbox
        """
        super().__init__(
            name="code_interpreter",
            telemetry_collector=telemetry_collector,
        )
        self.timeout = timeout
        self.max_output_length = max_output_length
        self.allowed_paths = allowed_paths or []

        # Determine isolation mode
        self._use_bubblewrap = False
        if isolation == "bubblewrap":
            if not _check_bubblewrap_available():
                raise RuntimeError(
                    "Bubblewrap (bwrap) not found. Install with: "
                    "apt-get install bubblewrap (Debian/Ubuntu) or "
                    "dnf install bubblewrap (Fedora/RHEL)"
                )
            self._use_bubblewrap = True
        elif isolation == "auto":
            self._use_bubblewrap = _check_bubblewrap_available()

        # Cache Python paths for bubblewrap
        if self._use_bubblewrap:
            self._python_paths = _get_python_lib_paths()

        self.code_extractor = re.compile(r"```[^\n]*\n([\s\S]*?)```", re.DOTALL)

    def _build_bwrap_command(self, script_path: str, sandbox_dir: str) -> List[str]:
        """Build the bubblewrap command for isolated execution."""
        cmd = [
            "bwrap",
            "--unshare-all",
            "--share-net",
            "--die-with-parent",
            "--new-session",
        ]

        # Mount system paths read-only
        for path in self._python_paths:
            if os.path.isdir(path):
                cmd.extend(["--ro-bind", path, path])
            elif os.path.isfile(path):
                cmd.extend(["--ro-bind", path, path])

        # Mount additional allowed paths read-only
        for path in self.allowed_paths:
            if os.path.exists(path):
                real_path = os.path.realpath(path)
                cmd.extend(["--ro-bind", real_path, real_path])

        # Essential virtual filesystems
        cmd.extend([
            "--proc", "/proc",
            "--dev", "/dev",
        ])

        # Create isolated /tmp with our sandbox directory
        cmd.extend([
            "--tmpfs", "/tmp",
            "--bind", sandbox_dir, "/sandbox",
            "--chdir", "/sandbox",
        ])

        # Set environment
        cmd.extend([
            "--setenv", "HOME", "/sandbox",
            "--setenv", "TMPDIR", "/tmp",
            "--setenv", "PATH", "/usr/bin:/bin:/usr/local/bin",
            "--setenv", "PYTHONDONTWRITEBYTECODE", "1",
            "--setenv", "PYTHONUNBUFFERED", "1",
        ])

        # Preserve PYTHONPATH for package access
        if os.environ.get("PYTHONPATH"):
            cmd.extend(["--setenv", "PYTHONPATH", os.environ["PYTHONPATH"]])

        # Add the Python command
        cmd.extend([sys.executable, script_path])

        return cmd

    def _preprocess_code(self, code: str) -> str:
        """Preprocess code to extract from markdown blocks."""
        match = self.code_extractor.search(code)
        if match:
            code = match.group(1).strip()
        return code

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute Python code in subprocess with optional isolation."""
        timeout = params.get("timeout", self.timeout)
        code = prompt.strip()

        # Basic safety check - warn about potentially dangerous operations
        dangerous_patterns = [
            "os.system", "subprocess.run", "subprocess.call", "subprocess.Popen",
            "shutil.rmtree", "os.remove", "os.rmdir", "__import__",
            "open(", "eval(", "exec(",
        ]
        warnings: List[str] = []

        code = self._preprocess_code(code)

        # Only warn in non-isolated mode (these are safe in sandbox)
        if not self._use_bubblewrap:
            for pattern in dangerous_patterns:
                if pattern in code:
                    warnings.append(f"Warning: Code contains '{pattern}'")

        try:
            if self._use_bubblewrap:
                return self._execute_sandboxed(code, timeout, warnings, params)
            else:
                return self._execute_direct(code, timeout, warnings, params)

        except subprocess.TimeoutExpired:
            return MCPToolResult(
                content=f"Error: Code execution timed out after {timeout} seconds",
                usage={},
                cost_usd=timeout * self.COST_PER_SECOND,
                metadata={"tool": "code_interpreter", "error": "timeout"},
            )
        except Exception as e:
            return MCPToolResult(
                content=f"Execution error: {type(e).__name__}: {e}",
                usage={},
                cost_usd=0.0,
                metadata={"tool": "code_interpreter", "error": str(e)},
            )

    def _execute_direct(
        self,
        code: str,
        timeout: int,
        warnings: List[str],
        params: dict,
    ) -> MCPToolResult:
        """Execute code directly in subprocess (no isolation)."""
        with tempfile.NamedTemporaryFile(
            mode="w",
            suffix=".py",
            delete=False
        ) as f:
            f.write(code)
            temp_file = f.name

        try:
            result = subprocess.run(
                [sys.executable, temp_file],
                capture_output=True,
                text=True,
                timeout=timeout,
                cwd=params.get("working_dir"),
            )
            return self._format_result(result, timeout, warnings, isolated=False)
        finally:
            try:
                os.unlink(temp_file)
            except OSError:
                pass

    def _execute_sandboxed(
        self,
        code: str,
        timeout: int,
        warnings: List[str],
        params: dict,
    ) -> MCPToolResult:
        """Execute code in bubblewrap sandbox with isolated filesystem."""
        with tempfile.TemporaryDirectory(prefix="ipw_sandbox_") as sandbox_dir:
            # Write script to sandbox
            script_path = os.path.join(sandbox_dir, "script.py")
            with open(script_path, "w") as f:
                f.write(code)

            # Build and run sandboxed command
            cmd = self._build_bwrap_command("/sandbox/script.py", sandbox_dir)

            # Optionally disable network
            if not params.get("network", True):
                if "--share-net" in cmd:
                    cmd.remove("--share-net")

            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                timeout=timeout,
            )

            return self._format_result(result, timeout, warnings, isolated=True)

    def _format_result(
        self,
        result: subprocess.CompletedProcess,
        timeout: int,
        warnings: List[str],
        isolated: bool,
    ) -> MCPToolResult:
        """Format subprocess result into MCPToolResult."""
        stdout = result.stdout
        stderr = result.stderr
        return_code = result.returncode

        # Truncate if too long
        if len(stdout) > self.max_output_length:
            stdout = stdout[:self.max_output_length] + "\n... (output truncated)"
        if len(stderr) > self.max_output_length:
            stderr = stderr[:self.max_output_length] + "\n... (output truncated)"

        # Format output
        lines = []
        if warnings:
            lines.extend(warnings)
            lines.append("")

        if stdout:
            lines.append("Output:")
            lines.append(stdout)

        if stderr:
            lines.append("Errors:")
            lines.append(stderr)

        if return_code != 0:
            lines.append(f"\nExit code: {return_code}")

        if not stdout and not stderr:
            lines.append("(No output)")

        content = "\n".join(lines)
        cost_usd = timeout * self.COST_PER_SECOND

        return MCPToolResult(
            content=content,
            usage={},
            cost_usd=cost_usd,
            metadata={
                "tool": "code_interpreter",
                "return_code": return_code,
                "timeout": timeout,
                "warnings": warnings,
                "isolated": isolated,
            },
        )

`init(timeout=30, max_output_length=10000, telemetry_collector=None, isolation=None, allowed_paths=None)` ¶

Initialize code interpreter.

Parameters:

Name	Type	Description	Default
`timeout`	`int`	Maximum execution time in seconds (default: 30)	`30`
`max_output_length`	`int`	Maximum characters to return (default: 10000)	`10000`
`telemetry_collector`	`Optional[Any]`	Energy monitor collector	`None`
`isolation`	`Optional[str]`	Isolation mode - None, "bubblewrap", or "auto"	`None`
`allowed_paths`	`Optional[List[str]]`	Additional paths to mount read-only in sandbox	`None`

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

def __init__(
    self,
    timeout: int = 30,
    max_output_length: int = 10000,
    telemetry_collector: Optional[Any] = None,
    isolation: Optional[str] = None,
    allowed_paths: Optional[List[str]] = None,
):
    """Initialize code interpreter.

    Args:
        timeout: Maximum execution time in seconds (default: 30)
        max_output_length: Maximum characters to return (default: 10000)
        telemetry_collector: Energy monitor collector
        isolation: Isolation mode - None, "bubblewrap", or "auto"
        allowed_paths: Additional paths to mount read-only in sandbox
    """
    super().__init__(
        name="code_interpreter",
        telemetry_collector=telemetry_collector,
    )
    self.timeout = timeout
    self.max_output_length = max_output_length
    self.allowed_paths = allowed_paths or []

    # Determine isolation mode
    self._use_bubblewrap = False
    if isolation == "bubblewrap":
        if not _check_bubblewrap_available():
            raise RuntimeError(
                "Bubblewrap (bwrap) not found. Install with: "
                "apt-get install bubblewrap (Debian/Ubuntu) or "
                "dnf install bubblewrap (Fedora/RHEL)"
            )
        self._use_bubblewrap = True
    elif isolation == "auto":
        self._use_bubblewrap = _check_bubblewrap_available()

    # Cache Python paths for bubblewrap
    if self._use_bubblewrap:
        self._python_paths = _get_python_lib_paths()

    self.code_extractor = re.compile(r"```[^\n]*\n([\s\S]*?)```", re.DOTALL)

`ThinkServer` ¶

Bases: BaseMCPServer

MCP server for internal reasoning/scratchpad.

This is a "thinking" tool that allows the model to break down complex problems step-by-step before delegating to other tools. It simply returns the input thought process without any processing.

Example

think = ThinkServer() result = think.execute("Let me break this down: 1) First... 2) Then...")

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

class ThinkServer(BaseMCPServer):
    """MCP server for internal reasoning/scratchpad.

    This is a "thinking" tool that allows the model to break down
    complex problems step-by-step before delegating to other tools.
    It simply returns the input thought process without any processing.

    Example:
        think = ThinkServer()
        result = think.execute("Let me break this down: 1) First... 2) Then...")
    """

    def __init__(self, telemetry_collector: Optional[Any] = None):
        super().__init__(
            name="think",
            telemetry_collector=telemetry_collector,
        )

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Process thinking/reasoning (pass-through)."""
        return MCPToolResult(
            content=f"[Thinking]\n{prompt}",
            usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
            cost_usd=0.0,
            metadata={"tool": "think"},
        )

`FileReadServer` ¶

Bases: BaseMCPServer

MCP server for reading file contents.

Security: Only allows reading files within allowed directories.

Example

reader = FileReadServer(allowed_dirs=["/workspace"]) result = reader.execute("/workspace/file.txt")

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

class FileReadServer(BaseMCPServer):
    """MCP server for reading file contents.

    Security: Only allows reading files within allowed directories.

    Example:
        reader = FileReadServer(allowed_dirs=["/workspace"])
        result = reader.execute("/workspace/file.txt")
    """

    def __init__(
        self,
        allowed_dirs: Optional[List[str]] = None,
        telemetry_collector: Optional[Any] = None,
    ):
        super().__init__(name="file_read", telemetry_collector=telemetry_collector)
        self.allowed_dirs = allowed_dirs or [os.getcwd()]

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Read file contents.

        Args:
            prompt: File path to read
            **params: start_line (1-indexed, default 1), end_line (optional)

        Returns:
            MCPToolResult with file contents or error message
        """
        file_path = prompt.strip()
        start_line = params.get("start_line", 1)
        end_line = params.get("end_line")

        # Security: resolve path and check if within allowed dirs
        try:
            resolved = os.path.realpath(file_path)
            if not any(
                resolved.startswith(os.path.realpath(d)) for d in self.allowed_dirs
            ):
                return MCPToolResult(
                    content="Error: Path not in allowed directories",
                    usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                    cost_usd=0.0,
                    metadata={"tool": "file_read", "error": "permission_denied"},
                )

            with open(resolved, "r", encoding="utf-8", errors="replace") as f:
                lines = f.readlines()

            # Apply line range (1-indexed)
            start_idx = max(0, start_line - 1)
            if end_line is not None:
                lines = lines[start_idx:end_line]
            else:
                lines = lines[start_idx:]

            content = "".join(lines)
            return MCPToolResult(
                content=content,
                usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                cost_usd=0.0,
                metadata={
                    "tool": "file_read",
                    "path": resolved,
                    "lines_read": len(lines),
                    "start_line": start_line,
                    "end_line": end_line,
                },
            )
        except FileNotFoundError:
            return MCPToolResult(
                content=f"Error: File not found: {file_path}",
                usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                cost_usd=0.0,
                metadata={"tool": "file_read", "error": "file_not_found"},
            )
        except Exception as e:
            return MCPToolResult(
                content=f"Error: {e}",
                usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                cost_usd=0.0,
                metadata={"tool": "file_read", "error": str(e)},
            )

`FileWriteServer` ¶

Bases: BaseMCPServer

MCP server for writing file contents.

Security: Only allows writing files within allowed directories. Creates parent directories if they don't exist.

Example

writer = FileWriteServer(allowed_dirs=["/workspace"]) result = writer.execute("/workspace/output.txt", content="Hello, World!")

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_server.py

class FileWriteServer(BaseMCPServer):
    """MCP server for writing file contents.

    Security: Only allows writing files within allowed directories.
    Creates parent directories if they don't exist.

    Example:
        writer = FileWriteServer(allowed_dirs=["/workspace"])
        result = writer.execute("/workspace/output.txt", content="Hello, World!")
    """

    def __init__(
        self,
        allowed_dirs: Optional[List[str]] = None,
        telemetry_collector: Optional[Any] = None,
    ):
        super().__init__(name="file_write", telemetry_collector=telemetry_collector)
        self.allowed_dirs = allowed_dirs or [os.getcwd()]

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Write content to file.

        Args:
            prompt: File path to write
            **params: content (required), mode ('w' for overwrite or 'a' for append)

        Returns:
            MCPToolResult with success message or error
        """
        file_path = prompt.strip()
        content = params.get("content", "")
        mode = params.get("mode", "w")

        # Validate mode
        if mode not in ("w", "a"):
            return MCPToolResult(
                content=f"Error: Invalid mode '{mode}'. Use 'w' (write) or 'a' (append).",
                usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                cost_usd=0.0,
                metadata={"tool": "file_write", "error": "invalid_mode"},
            )

        # Security: resolve path and check if within allowed dirs
        try:
            # For new files, resolve the parent directory
            parent_dir = os.path.dirname(file_path)
            if parent_dir:
                os.makedirs(parent_dir, exist_ok=True)

            resolved = os.path.realpath(file_path)
            if not any(
                resolved.startswith(os.path.realpath(d)) for d in self.allowed_dirs
            ):
                return MCPToolResult(
                    content="Error: Path not in allowed directories",
                    usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                    cost_usd=0.0,
                    metadata={"tool": "file_write", "error": "permission_denied"},
                )

            with open(resolved, mode, encoding="utf-8") as f:
                f.write(content)

            action = "appended to" if mode == "a" else "wrote"
            return MCPToolResult(
                content=f"Successfully {action} {len(content)} chars to {resolved}",
                usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                cost_usd=0.0,
                metadata={
                    "tool": "file_write",
                    "path": resolved,
                    "bytes_written": len(content),
                    "mode": mode,
                },
            )
        except Exception as e:
            return MCPToolResult(
                content=f"Error: {e}",
                usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
                cost_usd=0.0,
                metadata={"tool": "file_write", "error": str(e)},
            )

`ToolRegistry` ¶

Unified registry for all ToolOrchestra + ADP tools.

Example

registry = ToolRegistry() registry.discover_tools()

tools = registry.get_available_tools() small_llms = registry.get_tools_by_category(ToolCategory.LLM_SMALL)

calc = registry.get_tool_instance("calculator") result = calc.execute("2 + 2")

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

class ToolRegistry:
    """Unified registry for all ToolOrchestra + ADP tools.

    Example:
        registry = ToolRegistry()
        registry.discover_tools()

        tools = registry.get_available_tools()
        small_llms = registry.get_tools_by_category(ToolCategory.LLM_SMALL)

        calc = registry.get_tool_instance("calculator")
        result = calc.execute("2 + 2")
    """

    def __init__(
        self,
        ollama_base_url: str = "http://localhost:11434",
        vllm_base_url: str = "http://localhost:8000",
        telemetry_collector: Optional[Any] = None,
        code_isolation: Optional[str] = "auto",
        retrieval_gpu_device: Optional[int] = None,
    ):
        """Initialize tool registry.

        Args:
            ollama_base_url: Base URL for Ollama server
            vllm_base_url: Base URL for vLLM server
            telemetry_collector: Energy monitor collector for all tools
            code_isolation: Isolation mode for code_interpreter tool.
            retrieval_gpu_device: GPU device index for neural retrieval models.
        """
        self.ollama_base_url = ollama_base_url
        self.vllm_base_url = vllm_base_url
        self.telemetry_collector = telemetry_collector
        self.code_isolation = code_isolation
        self.retrieval_gpu_device = retrieval_gpu_device

        self._specs: Dict[str, ToolSpec] = {}
        self._instances: Dict[str, BaseMCPServer] = {}
        self._aliases: Dict[str, str] = {}

        # Register all known tools
        self._register_builtin_tools()

    def _register_builtin_tools(self):
        """Register all built-in tools from ToolOrchestra + ADP."""

        # === UTILITY TOOLS ===
        self.register(ToolSpec(
            name="calculator",
            category=ToolCategory.UTILITY,
            description="Evaluate mathematical expressions. Supports arithmetic, exponents, trig functions. Zero cost, instant.",
            estimated_latency_ms=1,
            estimated_cost_usd=0.0,
            capabilities=["math", "arithmetic", "computation"],
            adp_domains=["agenttuning_db", "agenttuning_kg"],
        ))

        self.register(ToolSpec(
            name="think",
            category=ToolCategory.UTILITY,
            description="Internal reasoning scratchpad. Break down complex problems step-by-step before delegating. Zero cost.",
            estimated_latency_ms=1,
            estimated_cost_usd=0.0,
            capabilities=["reasoning", "planning", "decomposition"],
            adp_domains=["codeact", "agenttuning_alfworld"],
        ))

        # === CODE TOOLS ===
        self.register(ToolSpec(
            name="code_interpreter",
            category=ToolCategory.CODE,
            description="Execute Python code in sandbox. Returns stdout/stderr. 30s timeout.",
            estimated_latency_ms=1000,
            estimated_cost_usd=0.00001,
            capabilities=["code_execution", "python", "computation"],
            adp_domains=["codeact", "code_feedback", "swe-smith"],
        ))

        self.register(ToolSpec(
            name="file_read",
            category=ToolCategory.CODE,
            description="Read file contents. Supports line ranges. Zero cost, instant.",
            estimated_latency_ms=10,
            estimated_cost_usd=0.0,
            capabilities=["file_operations", "code_analysis"],
            adp_domains=["codeact", "swe-smith"],
        ))

        self.register(ToolSpec(
            name="file_write",
            category=ToolCategory.CODE,
            description="Write content to file. Supports write and append modes. Zero cost, instant.",
            estimated_latency_ms=10,
            estimated_cost_usd=0.0,
            capabilities=["file_operations", "code_generation"],
            adp_domains=["codeact", "swe-smith"],
        ))

        # === SEARCH TOOLS ===
        self.register(ToolSpec(
            name="web_search",
            category=ToolCategory.SEARCH,
            description="Search the web via Tavily API. Cost: $0.01/search.",
            estimated_latency_ms=500,
            estimated_cost_usd=0.01,
            requires_api_key="TAVILY_API_KEY",
            capabilities=["search", "retrieval", "current_info"],
            adp_domains=["agenttuning_webshop", "mind2web", "go-browse-wa"],
        ))

        # === RETRIEVAL TOOLS ===
        self.register(ToolSpec(
            name="retrieval:grep",
            category=ToolCategory.SEARCH,
            description="Fast regex/keyword search. No indexing, ~1ms latency.",
            estimated_latency_ms=1,
            estimated_cost_usd=0.0,
            capabilities=["search", "retrieval", "keyword_search", "regex"],
        ))

        self.register(ToolSpec(
            name="retrieval:bm25",
            category=ToolCategory.SEARCH,
            description="BM25 sparse retrieval. Fast, CPU-only, ~10ms latency.",
            estimated_latency_ms=10,
            estimated_cost_usd=0.0,
            capabilities=["search", "retrieval", "keyword_search"],
        ))

        self.register(ToolSpec(
            name="retrieval:dense",
            category=ToolCategory.SEARCH,
            description="Dense neural retrieval with FAISS. Semantic search, ~50ms.",
            estimated_latency_ms=50,
            estimated_cost_usd=0.0,
            capabilities=["search", "retrieval", "semantic_search"],
        ))

        self.register(ToolSpec(
            name="retrieval:hybrid",
            category=ToolCategory.SEARCH,
            description="Hybrid BM25 + dense retrieval with RRF fusion. Best accuracy, ~100ms.",
            estimated_latency_ms=100,
            estimated_cost_usd=0.0,
            capabilities=["search", "retrieval", "semantic_search", "keyword_search"],
        ))

        # === SMALL LLMs (<3B) via Ollama ===
        small_llms = [
            ("ollama:qwen2.5:0.5b", "Qwen2.5 0.5B - Fastest, basic tasks", 300, ["basic_qa"]),
            ("ollama:qwen2.5:1.5b", "Qwen2.5 1.5B - Fast, simple reasoning", 800, ["simple_reasoning"]),
            ("ollama:qwen3:1.5b", "Qwen3 1.5B - Fast reasoning with Qwen3 architecture", 800, ["simple_reasoning"]),
            ("ollama:llama3.2:1b", "Llama3.2 1B - Fast, general tasks", 500, ["basic_qa"]),
        ]
        for name, desc, latency, caps in small_llms:
            self.register(ToolSpec(
                name=name,
                category=ToolCategory.LLM_SMALL,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=0.0,
                requires_server="ollama",
                capabilities=caps + ["text_generation"],
            ))

        # === MEDIUM LLMs (3-10B) via Ollama ===
        medium_llms = [
            ("ollama:qwen2.5:3b", "Qwen2.5 3B - Balanced speed/quality", 1500, ["reasoning"]),
            ("ollama:qwen2.5:7b", "Qwen2.5 7B - Good reasoning", 3000, ["reasoning", "complex_qa"]),
            ("ollama:llama3.2:3b", "Llama3.2 3B - Balanced, general", 1500, ["reasoning"]),
            ("vllm:qwen3-8b", "Qwen3 8B - High quality reasoning", 2000, ["complex_reasoning"]),
            ("vllm:llama-8b", "Llama3.1 8B - Strong general model", 2000, ["complex_reasoning"]),
        ]
        for name, desc, latency, caps in medium_llms:
            self.register(ToolSpec(
                name=name,
                category=ToolCategory.LLM_MEDIUM,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=0.0,
                requires_server="ollama",
                capabilities=caps + ["text_generation"],
            ))

        # === LARGE LLMs (>10B) via vLLM ===
        large_llms = [
            ("vllm:qwen3-32b", "Qwen3 32B - Best open-source quality", 5000, ["complex_reasoning", "analysis"]),
            ("vllm:llama-70b", "Llama3.1 70B - Near SOTA quality", 8000, ["complex_reasoning", "analysis"]),
        ]
        for name, desc, latency, caps in large_llms:
            self.register(ToolSpec(
                name=name,
                category=ToolCategory.LLM_LARGE,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=0.0,
                requires_server="vllm",
                capabilities=caps + ["text_generation"],
            ))

        # === SPECIALIST LLMs via vLLM ===
        specialist_llms = [
            ("vllm:qwen-math-7b", "Qwen Math 7B - Math specialist", 2000, ["math", "problem_solving"]),
            ("vllm:glm-4.7", "GLM-4.7 - Best math model", 8000, ["math", "complex_math"]),
            ("vllm:qwen-coder-7b", "Qwen Coder 7B - Code specialist", 2000, ["code", "programming"]),
            ("vllm:qwen3-coder-plus", "Qwen3 Coder Plus - Best code model", 5000, ["code", "complex_code"]),
        ]
        for name, desc, latency, caps in specialist_llms:
            self.register(ToolSpec(
                name=name,
                category=ToolCategory.LLM_SPECIALIST,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=0.0,
                requires_server="vllm",
                capabilities=caps + ["text_generation"],
                adp_domains=["codeact", "code_feedback"] if "code" in caps else [],
            ))

        # === LLM ALIASES ===
        llm_aliases = [
            ("llm_small", "vllm:qwen3-1.5b", "Small LLM for fast reasoning (Qwen3 1.5B)",
             ToolCategory.LLM_SMALL, 100, ["simple_reasoning"]),
            ("llm_medium", "vllm:qwen3-8b", "Medium LLM for balanced tasks (Qwen3 8B)",
             ToolCategory.LLM_MEDIUM, 300, ["reasoning"]),
            ("llm_large", "vllm:qwen3-32b", "Large LLM for complex reasoning (Qwen3 32B)",
             ToolCategory.LLM_LARGE, 1000, ["complex_reasoning"]),
            ("llm_specialist", "vllm:qwen-coder-32b", "Specialist LLM for code (Qwen Coder 32B)",
             ToolCategory.LLM_SPECIALIST, 1000, ["code", "programming"]),
        ]
        for alias_name, target, desc, category, latency, caps in llm_aliases:
            self.register(ToolSpec(
                name=alias_name,
                category=category,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=0.0,
                requires_server="vllm",
                capabilities=caps + ["text_generation"],
            ))
            self._aliases[alias_name] = target

        # === CLOUD LLMs ===
        cloud_llms = [
            ("openai:gpt-5-mini-2025-08-07", "GPT-5 Mini - Fast, capable cloud", 800, 0.005, ["reasoning"], "OPENAI_API_KEY"),
            ("openai:gpt-4o", "GPT-4o - Best GPT-4 model", 1000, 0.0025, ["complex_reasoning"], "OPENAI_API_KEY"),
            ("openai:o1-mini", "o1-mini - Reasoning model", 2000, 0.003, ["deep_reasoning"], "OPENAI_API_KEY"),
            ("openai:o1", "o1 - Best reasoning model", 5000, 0.015, ["deep_reasoning"], "OPENAI_API_KEY"),
            ("openai:gpt-5.2-2025-12-11", "GPT-5.2 - Most capable OpenAI model", 2000, 0.03, ["complex_reasoning", "analysis"], "OPENAI_API_KEY"),
            ("openai:gpt-5-nano-2025-08-07", "GPT-5 Nano - Fastest, cheapest", 400, 0.001, ["basic_reasoning", "fast"], "OPENAI_API_KEY"),
            ("anthropic:claude-3-5-haiku-20241022", "Claude 3.5 Haiku - Fast, cheap", 400, 0.0008, ["reasoning"], "ANTHROPIC_API_KEY"),
            ("anthropic:claude-sonnet-4-20250514", "Claude Sonnet 4 - Balanced", 800, 0.003, ["complex_reasoning"], "ANTHROPIC_API_KEY"),
            ("anthropic:claude-opus-4-20250514", "Claude Opus 4 - Most capable", 2000, 0.015, ["complex_reasoning", "analysis"], "ANTHROPIC_API_KEY"),
            ("anthropic:claude-haiku-4-5-20251001", "Claude 4.5 Haiku - Fast, cheap", 300, 0.001, ["reasoning", "fast"], "ANTHROPIC_API_KEY"),
            ("anthropic:claude-sonnet-4-5-20250929", "Claude 4.5 Sonnet - Balanced quality/speed", 600, 0.004, ["complex_reasoning"], "ANTHROPIC_API_KEY"),
            ("anthropic:claude-opus-4-5-20251101", "Claude 4.5 Opus - Most capable Anthropic model", 1500, 0.02, ["complex_reasoning", "analysis", "deep_reasoning"], "ANTHROPIC_API_KEY"),
        ]
        for name, desc, latency, cost, caps, api_key in cloud_llms:
            self.register(ToolSpec(
                name=name,
                category=ToolCategory.LLM_CLOUD,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=cost,
                requires_api_key=api_key,
                capabilities=caps + ["text_generation"],
            ))

        # === OPENROUTER ===
        openrouter_models = [
            ("openrouter:google/gemini-2.5-flash", "Gemini 2.5 Flash via OpenRouter", 500, 0.00015, ["reasoning", "fast"]),
            ("openrouter:google/gemini-2.5-pro", "Gemini 2.5 Pro via OpenRouter", 1000, 0.00125, ["complex_reasoning"]),
            ("openrouter:anthropic/claude-sonnet-4", "Claude Sonnet 4 via OpenRouter", 800, 0.003, ["complex_reasoning"]),
            ("openrouter:openai/gpt-4o", "GPT-4o via OpenRouter", 1000, 0.0025, ["complex_reasoning"]),
            ("openrouter:openai/gpt-5-mini-2025-08-07", "GPT-5 Mini via OpenRouter", 800, 0.005, ["reasoning"]),
            ("openrouter:meta-llama/llama-3.3-70b-instruct", "Llama 3.3 70B via OpenRouter", 2000, 0.0004, ["reasoning"]),
            ("openrouter:qwen/qwen-2.5-72b-instruct", "Qwen 2.5 72B via OpenRouter", 2000, 0.00035, ["reasoning"]),
            ("openrouter:qwen/qwq-32b", "QwQ 32B reasoning model via OpenRouter", 3000, 0.00015, ["deep_reasoning"]),
            ("openrouter:deepseek/deepseek-r1", "DeepSeek R1 reasoning via OpenRouter", 3000, 0.00055, ["deep_reasoning"]),
            ("openrouter:deepseek/deepseek-chat-v3-0324", "DeepSeek Chat V3 via OpenRouter", 1000, 0.00014, ["reasoning"]),
            ("openrouter:mistralai/mistral-large-2411", "Mistral Large via OpenRouter", 1500, 0.002, ["reasoning"]),
            ("openrouter:qwen/qwen3-32b", "Qwen3 32B via OpenRouter", 3000, 0.0002, ["complex_reasoning", "analysis"]),
            ("openrouter:z-ai/glm-4.7", "GLM-4.7 via OpenRouter - Best math model", 4000, 0.0004, ["math", "complex_math", "problem_solving"]),
            ("openrouter:qwen/qwen3-coder-plus", "Qwen3 Coder Plus via OpenRouter", 3000, 0.0002, ["code", "complex_code", "programming"]),
        ]
        for name, desc, latency, cost, caps in openrouter_models:
            self.register(ToolSpec(
                name=name,
                category=ToolCategory.LLM_CLOUD,
                description=desc,
                estimated_latency_ms=latency,
                estimated_cost_usd=cost,
                requires_api_key="OPENROUTER_API_KEY",
                capabilities=caps + ["text_generation"],
            ))

        # === ADP DOMAIN-SPECIFIC TOOLS ===
        self.register(ToolSpec(
            name="adp:codeact",
            category=ToolCategory.ADP_CODEACT,
            description="Execute code actions from ADP codeact domain.",
            capabilities=["code_execution", "reasoning"],
            adp_domains=["codeact"],
        ))

        self.register(ToolSpec(
            name="adp:alfworld",
            category=ToolCategory.ADP_ALFWORLD,
            description="Household task actions from ALFWorld domain.",
            capabilities=["embodied_actions", "planning"],
            adp_domains=["agenttuning_alfworld"],
        ))

        self.register(ToolSpec(
            name="adp:mind2web",
            category=ToolCategory.ADP_MIND2WEB,
            description="Web navigation actions from Mind2Web domain.",
            capabilities=["web_navigation", "ui_interaction"],
            adp_domains=["mind2web", "agenttuning_mind2web"],
        ))

        self.register(ToolSpec(
            name="adp:database",
            category=ToolCategory.ADP_DATABASE,
            description="Database query actions from ADP database domain.",
            capabilities=["sql", "query"],
            adp_domains=["agenttuning_db"],
        ))

    def register(self, spec: ToolSpec):
        """Register a tool specification."""
        self._specs[spec.name] = spec

    def get_spec(self, name: str) -> Optional[ToolSpec]:
        """Get tool specification by name."""
        return self._specs.get(name)

    def get_all_specs(self) -> List[ToolSpec]:
        """Get all registered tool specifications."""
        return list(self._specs.values())

    def get_specs_by_category(self, category: ToolCategory) -> List[ToolSpec]:
        """Get tool specifications by category."""
        return [s for s in self._specs.values() if s.category == category]

    def get_specs_for_domain(self, domain: str) -> List[ToolSpec]:
        """Get tool specifications relevant for an ADP domain."""
        return [s for s in self._specs.values() if domain in s.adp_domains]

    def get_specs_by_capability(self, capability: str) -> List[ToolSpec]:
        """Get tool specifications by capability tag."""
        return [s for s in self._specs.values() if capability in s.capabilities]

    def discover_available_tools(self) -> List[str]:
        """Discover which tools are actually available."""
        available = []

        for name, spec in self._specs.items():
            if spec.requires_api_key:
                if not os.environ.get(spec.requires_api_key):
                    continue
            available.append(name)

        return available

    def get_tool_instance(self, name: str) -> Optional[BaseMCPServer]:
        """Get or create a tool instance."""
        if name in self._instances:
            return self._instances[name]

        spec = self._specs.get(name)
        if not spec:
            return None

        instance = self._create_instance(name, spec)
        if instance:
            self._instances[name] = instance

        return instance

    def _create_instance(self, name: str, spec: ToolSpec) -> Optional[BaseMCPServer]:
        """Create a tool instance from specification."""
        try:
            # Check for alias first
            if name in self._aliases:
                target_name = self._aliases[name]
                target_spec = self._specs.get(target_name)
                return self._create_instance(target_name, target_spec)

            # Utility tools
            if name == "calculator":
                from ipw.agents.mcp.tool_server import CalculatorServer
                return CalculatorServer(telemetry_collector=self.telemetry_collector)

            elif name == "think":
                from ipw.agents.mcp.tool_server import ThinkServer
                return ThinkServer(telemetry_collector=self.telemetry_collector)

            elif name == "code_interpreter":
                from ipw.agents.mcp.tool_server import CodeInterpreterServer
                return CodeInterpreterServer(
                    telemetry_collector=self.telemetry_collector,
                    isolation=self.code_isolation,
                )

            elif name == "web_search":
                from ipw.agents.mcp.tool_server import WebSearchServer
                return WebSearchServer(telemetry_collector=self.telemetry_collector)

            elif name == "file_read":
                from ipw.agents.mcp.tool_server import FileReadServer
                return FileReadServer(telemetry_collector=self.telemetry_collector)

            elif name == "file_write":
                from ipw.agents.mcp.tool_server import FileWriteServer
                return FileWriteServer(telemetry_collector=self.telemetry_collector)

            # Ollama models
            elif name.startswith("ollama:"):
                from ipw.agents.mcp.ollama_server import OllamaMCPServer
                model_name = name.split(":", 1)[1]
                return OllamaMCPServer(
                    model_name=model_name,
                    base_url=self.ollama_base_url,
                    telemetry_collector=self.telemetry_collector,
                )

            # vLLM models
            elif name.startswith("vllm:"):
                from ipw.agents.mcp.vllm_server import VLLMMCPServer
                model_name = name.split(":", 1)[1]
                return VLLMMCPServer(
                    model_name=model_name,
                    vllm_url=self.vllm_base_url,
                    telemetry_collector=self.telemetry_collector,
                )

            # OpenAI models
            elif name.startswith("openai:"):
                from ipw.agents.mcp.openai_server import OpenAIMCPServer
                model_name = name.split(":", 1)[1]
                return OpenAIMCPServer(
                    model_name=model_name,
                    telemetry_collector=self.telemetry_collector,
                )

            # Anthropic models
            elif name.startswith("anthropic:"):
                from ipw.agents.mcp.anthropic_server import AnthropicMCPServer
                model_name = name.split(":", 1)[1]
                return AnthropicMCPServer(
                    model_name=model_name,
                    telemetry_collector=self.telemetry_collector,
                )

            # OpenRouter models
            elif name.startswith("openrouter:"):
                from ipw.agents.mcp.openrouter_server import OpenRouterMCPServer
                model_name = name.split(":", 1)[1]
                return OpenRouterMCPServer(
                    model_name=model_name,
                    telemetry_collector=self.telemetry_collector,
                )

            # ADP domain tools
            elif name.startswith("adp:"):
                return ADPDomainServer(
                    domain=name.split(":", 1)[1],
                    telemetry_collector=self.telemetry_collector,
                )

            # Retrieval tools
            elif name.startswith("retrieval:"):
                retrieval_type = name.split(":", 1)[1]
                if retrieval_type == "grep":
                    from ipw.agents.mcp.retrieval import GrepRetrievalServer
                    return GrepRetrievalServer(
                        telemetry_collector=self.telemetry_collector,
                    )
                elif retrieval_type == "bm25":
                    from ipw.agents.mcp.retrieval import BM25RetrievalServer
                    return BM25RetrievalServer(
                        telemetry_collector=self.telemetry_collector,
                    )
                elif retrieval_type == "dense":
                    from ipw.agents.mcp.retrieval import DenseRetrievalServer
                    use_gpu = self.retrieval_gpu_device is not None
                    return DenseRetrievalServer(
                        telemetry_collector=self.telemetry_collector,
                        use_gpu=use_gpu,
                        gpu_device=self.retrieval_gpu_device or 0,
                    )
                elif retrieval_type == "hybrid":
                    from ipw.agents.mcp.retrieval import HybridRetrievalServer
                    use_gpu = self.retrieval_gpu_device is not None
                    return HybridRetrievalServer(
                        model_name="Qwen/Qwen3-Embedding-4B",
                        telemetry_collector=self.telemetry_collector,
                        use_gpu=use_gpu,
                        gpu_device=self.retrieval_gpu_device or 0,
                    )

        except ImportError as e:
            print(f"Warning: Could not import server for '{name}': {e}")
        except Exception as e:
            print(f"Warning: Could not create instance for '{name}': {e}")

        return None

    def get_tool_descriptions(self, tools: Optional[List[str]] = None) -> str:
        """Get formatted tool descriptions for prompting."""
        if tools is None:
            tools = self.discover_available_tools()

        lines = ["Available tools:"]
        for name in tools:
            spec = self._specs.get(name)
            if spec:
                cost_info = f"${spec.estimated_cost_usd:.4f}" if spec.estimated_cost_usd > 0 else "free"
                lines.append(f"- {name}: {spec.description} ({cost_info}, ~{spec.estimated_latency_ms}ms)")

        return "\n".join(lines)

`init(ollama_base_url='http://localhost:11434', vllm_base_url='http://localhost:8000', telemetry_collector=None, code_isolation='auto', retrieval_gpu_device=None)` ¶

Initialize tool registry.

Parameters:

Name	Type	Description	Default
`ollama_base_url`	`str`	Base URL for Ollama server	`'http://localhost:11434'`
`vllm_base_url`	`str`	Base URL for vLLM server	`'http://localhost:8000'`
`telemetry_collector`	`Optional[Any]`	Energy monitor collector for all tools	`None`
`code_isolation`	`Optional[str]`	Isolation mode for code_interpreter tool.	`'auto'`
`retrieval_gpu_device`	`Optional[int]`	GPU device index for neural retrieval models.	`None`

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def __init__(
    self,
    ollama_base_url: str = "http://localhost:11434",
    vllm_base_url: str = "http://localhost:8000",
    telemetry_collector: Optional[Any] = None,
    code_isolation: Optional[str] = "auto",
    retrieval_gpu_device: Optional[int] = None,
):
    """Initialize tool registry.

    Args:
        ollama_base_url: Base URL for Ollama server
        vllm_base_url: Base URL for vLLM server
        telemetry_collector: Energy monitor collector for all tools
        code_isolation: Isolation mode for code_interpreter tool.
        retrieval_gpu_device: GPU device index for neural retrieval models.
    """
    self.ollama_base_url = ollama_base_url
    self.vllm_base_url = vllm_base_url
    self.telemetry_collector = telemetry_collector
    self.code_isolation = code_isolation
    self.retrieval_gpu_device = retrieval_gpu_device

    self._specs: Dict[str, ToolSpec] = {}
    self._instances: Dict[str, BaseMCPServer] = {}
    self._aliases: Dict[str, str] = {}

    # Register all known tools
    self._register_builtin_tools()

`register(spec)` ¶

Register a tool specification.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def register(self, spec: ToolSpec):
    """Register a tool specification."""
    self._specs[spec.name] = spec

`get_spec(name)` ¶

Get tool specification by name.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_spec(self, name: str) -> Optional[ToolSpec]:
    """Get tool specification by name."""
    return self._specs.get(name)

`get_all_specs()` ¶

Get all registered tool specifications.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_all_specs(self) -> List[ToolSpec]:
    """Get all registered tool specifications."""
    return list(self._specs.values())

`get_specs_by_category(category)` ¶

Get tool specifications by category.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_specs_by_category(self, category: ToolCategory) -> List[ToolSpec]:
    """Get tool specifications by category."""
    return [s for s in self._specs.values() if s.category == category]

`get_specs_for_domain(domain)` ¶

Get tool specifications relevant for an ADP domain.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_specs_for_domain(self, domain: str) -> List[ToolSpec]:
    """Get tool specifications relevant for an ADP domain."""
    return [s for s in self._specs.values() if domain in s.adp_domains]

`get_specs_by_capability(capability)` ¶

Get tool specifications by capability tag.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_specs_by_capability(self, capability: str) -> List[ToolSpec]:
    """Get tool specifications by capability tag."""
    return [s for s in self._specs.values() if capability in s.capabilities]

`discover_available_tools()` ¶

Discover which tools are actually available.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def discover_available_tools(self) -> List[str]:
    """Discover which tools are actually available."""
    available = []

    for name, spec in self._specs.items():
        if spec.requires_api_key:
            if not os.environ.get(spec.requires_api_key):
                continue
        available.append(name)

    return available

`get_tool_instance(name)` ¶

Get or create a tool instance.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_tool_instance(self, name: str) -> Optional[BaseMCPServer]:
    """Get or create a tool instance."""
    if name in self._instances:
        return self._instances[name]

    spec = self._specs.get(name)
    if not spec:
        return None

    instance = self._create_instance(name, spec)
    if instance:
        self._instances[name] = instance

    return instance

`get_tool_descriptions(tools=None)` ¶

Get formatted tool descriptions for prompting.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_tool_descriptions(self, tools: Optional[List[str]] = None) -> str:
    """Get formatted tool descriptions for prompting."""
    if tools is None:
        tools = self.discover_available_tools()

    lines = ["Available tools:"]
    for name in tools:
        spec = self._specs.get(name)
        if spec:
            cost_info = f"${spec.estimated_cost_usd:.4f}" if spec.estimated_cost_usd > 0 else "free"
            lines.append(f"- {name}: {spec.description} ({cost_info}, ~{spec.estimated_latency_ms}ms)")

    return "\n".join(lines)

`ToolSpec` `dataclass` ¶

Specification for a tool in the registry.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

@dataclass
class ToolSpec:
    """Specification for a tool in the registry."""

    name: str
    """Unique tool identifier (e.g., 'calculator', 'ollama:qwen2.5:1.5b')"""

    category: ToolCategory
    """Tool category for routing decisions"""

    description: str
    """Human-readable description for policy model"""

    server_class: Optional[Type[BaseMCPServer]] = None
    """MCP server class to instantiate"""

    factory: Optional[Callable[..., BaseMCPServer]] = None
    """Factory function for custom initialization"""

    # Cost/efficiency metadata (for routing decisions)
    estimated_latency_ms: float = 0.0
    """Estimated latency in milliseconds"""

    estimated_cost_usd: float = 0.0
    """Estimated cost per call in USD"""

    estimated_energy_joules: float = 0.0
    """Estimated energy consumption per call"""

    requires_api_key: Optional[str] = None
    """Environment variable name for required API key"""

    requires_server: Optional[str] = None
    """Required server (e.g., 'ollama', 'vllm')"""

    # ADP domain mapping
    adp_domains: List[str] = field(default_factory=list)
    """ADP domains this tool is relevant for"""

    # Capability tags for semantic matching
    capabilities: List[str] = field(default_factory=list)
    """Capability tags (e.g., 'math', 'code', 'reasoning')"""

`name` `instance-attribute` ¶

Unique tool identifier (e.g., 'calculator', 'ollama:qwen2.5:1.5b')

`category` `instance-attribute` ¶

Tool category for routing decisions

`description` `instance-attribute` ¶

Human-readable description for policy model

`server_class = None` `class-attribute` `instance-attribute` ¶

MCP server class to instantiate

`factory = None` `class-attribute` `instance-attribute` ¶

Factory function for custom initialization

`estimated_latency_ms = 0.0` `class-attribute` `instance-attribute` ¶

Estimated latency in milliseconds

`estimated_cost_usd = 0.0` `class-attribute` `instance-attribute` ¶

Estimated cost per call in USD

`estimated_energy_joules = 0.0` `class-attribute` `instance-attribute` ¶

Estimated energy consumption per call

`requires_api_key = None` `class-attribute` `instance-attribute` ¶

Environment variable name for required API key

`requires_server = None` `class-attribute` `instance-attribute` ¶

Required server (e.g., 'ollama', 'vllm')

`adp_domains = field(default_factory=list)` `class-attribute` `instance-attribute` ¶

ADP domains this tool is relevant for

`capabilities = field(default_factory=list)` `class-attribute` `instance-attribute` ¶

Capability tags (e.g., 'math', 'code', 'reasoning')

`ToolCategory` ¶

Bases: Enum

Categories of tools matching ToolOrchestra + ADP domains.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

class ToolCategory(Enum):
    """Categories of tools matching ToolOrchestra + ADP domains."""

    # ToolOrchestra core tools
    UTILITY = "utility"
    CODE = "code"
    SEARCH = "search"

    # LLM backends (by size/capability)
    LLM_SMALL = "llm_small"
    LLM_MEDIUM = "llm_medium"
    LLM_LARGE = "llm_large"
    LLM_SPECIALIST = "llm_specialist"
    LLM_CLOUD = "llm_cloud"

    # ADP domain-specific actions
    ADP_CODEACT = "adp_codeact"
    ADP_ALFWORLD = "adp_alfworld"
    ADP_MIND2WEB = "adp_mind2web"
    ADP_DATABASE = "adp_database"

`ADPDomainServer` ¶

Bases: BaseMCPServer

Passthrough server for ADP domain-specific actions.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

class ADPDomainServer(BaseMCPServer):
    """Passthrough server for ADP domain-specific actions."""

    def __init__(
        self,
        domain: str,
        telemetry_collector: Optional[Any] = None,
    ):
        super().__init__(
            name=f"adp:{domain}",
            telemetry_collector=telemetry_collector,
        )
        self.domain = domain

    def _execute_impl(self, prompt: str, **params: Any) -> MCPToolResult:
        """Execute ADP domain action (passthrough for training)."""
        return MCPToolResult(
            content=f"[ADP:{self.domain}] Action executed: {prompt[:100]}...",
            usage={},
            cost_usd=0.0,
            metadata={
                "tool": f"adp:{self.domain}",
                "domain": self.domain,
                "action": prompt,
            },
        )

`get_registry(**kwargs)` ¶

Get or create the global tool registry.

Source code in intelligence-per-watt/src/ipw/agents/mcp/tool_registry.py

def get_registry(**kwargs) -> ToolRegistry:
    """Get or create the global tool registry."""
    global _global_registry, _global_registry_kwargs

    if _global_registry is not None and kwargs != _global_registry_kwargs:
        _global_registry = None

    if _global_registry is None:
        _global_registry = ToolRegistry(**kwargs)
        _global_registry_kwargs = kwargs.copy()

    return _global_registry

Index

ipw.agents.mcp ¶

BaseMCPServer ¶

__init__(name, telemetry_collector=None, event_recorder=None) ¶

execute(prompt, **params) ¶

health_check() ¶

MCPToolResult dataclass ¶

content instance-attribute ¶

usage = field(default_factory=dict) class-attribute instance-attribute ¶

cost_usd = None class-attribute instance-attribute ¶

telemetry_samples = field(default_factory=list) class-attribute instance-attribute ¶

latency_seconds = 0.0 class-attribute instance-attribute ¶

ttft_seconds = None class-attribute instance-attribute ¶

metadata = field(default_factory=dict) class-attribute instance-attribute ¶

OpenAIMCPServer ¶

__init__(model_name, api_key=None, telemetry_collector=None, event_recorder=None, **openai_params) ¶

health_check() ¶

list_available_models() ¶

AnthropicMCPServer ¶

__init__(model_name, api_key=None, telemetry_collector=None, **anthropic_params) ¶

health_check() ¶

OpenRouterMCPServer ¶

__init__(model_name, api_key=None, telemetry_collector=None, site_url=None, app_name=None, **openai_params) ¶

health_check() ¶

list_popular_models() classmethod ¶

VLLMMCPServer ¶

Start vLLM server externally:¶

vllm serve Qwen/Qwen3-32B --tensor-parallel-size 4 --port 8000¶

__init__(model_name, vllm_url='http://localhost:8000', api_key=None, telemetry_collector=None, event_recorder=None, **vllm_params) ¶

health_check() ¶

list_supported_models() classmethod ¶

CalculatorServer ¶

WebSearchServer ¶

CodeInterpreterServer ¶

__init__(timeout=30, max_output_length=10000, telemetry_collector=None, isolation=None, allowed_paths=None) ¶

ThinkServer ¶

FileReadServer ¶

FileWriteServer ¶

ToolRegistry ¶

__init__(ollama_base_url='http://localhost:11434', vllm_base_url='http://localhost:8000', telemetry_collector=None, code_isolation='auto', retrieval_gpu_device=None) ¶

register(spec) ¶

get_spec(name) ¶

get_all_specs() ¶

get_specs_by_category(category) ¶

get_specs_for_domain(domain) ¶

get_specs_by_capability(capability) ¶

discover_available_tools() ¶

get_tool_instance(name) ¶

get_tool_descriptions(tools=None) ¶

ToolSpec dataclass ¶

name instance-attribute ¶

category instance-attribute ¶

description instance-attribute ¶

server_class = None class-attribute instance-attribute ¶

factory = None class-attribute instance-attribute ¶

estimated_latency_ms = 0.0 class-attribute instance-attribute ¶

estimated_cost_usd = 0.0 class-attribute instance-attribute ¶

estimated_energy_joules = 0.0 class-attribute instance-attribute ¶

requires_api_key = None class-attribute instance-attribute ¶

requires_server = None class-attribute instance-attribute ¶

adp_domains = field(default_factory=list) class-attribute instance-attribute ¶

capabilities = field(default_factory=list) class-attribute instance-attribute ¶

ToolCategory ¶

ADPDomainServer ¶

get_registry(**kwargs) ¶

`ipw.agents.mcp` ¶

`BaseMCPServer` ¶

`init(name, telemetry_collector=None, event_recorder=None)` ¶

`execute(prompt, **params)` ¶

`health_check()` ¶

`MCPToolResult` `dataclass` ¶

`content` `instance-attribute` ¶

`usage = field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`cost_usd = None` `class-attribute` `instance-attribute` ¶

`telemetry_samples = field(default_factory=list)` `class-attribute` `instance-attribute` ¶

`latency_seconds = 0.0` `class-attribute` `instance-attribute` ¶

`ttft_seconds = None` `class-attribute` `instance-attribute` ¶

`metadata = field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`OpenAIMCPServer` ¶

`init(model_name, api_key=None, telemetry_collector=None, event_recorder=None, **openai_params)` ¶

`health_check()` ¶

`list_available_models()` ¶

`AnthropicMCPServer` ¶

`init(model_name, api_key=None, telemetry_collector=None, **anthropic_params)` ¶

`health_check()` ¶

`OpenRouterMCPServer` ¶

`init(model_name, api_key=None, telemetry_collector=None, site_url=None, app_name=None, **openai_params)` ¶

`health_check()` ¶

`list_popular_models()` `classmethod` ¶

`VLLMMCPServer` ¶

`init(model_name, vllm_url='http://localhost:8000', api_key=None, telemetry_collector=None, event_recorder=None, **vllm_params)` ¶

`health_check()` ¶

`list_supported_models()` `classmethod` ¶

`CalculatorServer` ¶

`WebSearchServer` ¶

`CodeInterpreterServer` ¶

`init(timeout=30, max_output_length=10000, telemetry_collector=None, isolation=None, allowed_paths=None)` ¶

`ThinkServer` ¶

`FileReadServer` ¶

`FileWriteServer` ¶

`ToolRegistry` ¶

`init(ollama_base_url='http://localhost:11434', vllm_base_url='http://localhost:8000', telemetry_collector=None, code_isolation='auto', retrieval_gpu_device=None)` ¶

`register(spec)` ¶

`get_spec(name)` ¶

`get_all_specs()` ¶

`get_specs_by_category(category)` ¶

`get_specs_for_domain(domain)` ¶

`get_specs_by_capability(capability)` ¶

`discover_available_tools()` ¶

`get_tool_instance(name)` ¶

`get_tool_descriptions(tools=None)` ¶

`ToolSpec` `dataclass` ¶

`name` `instance-attribute` ¶

`category` `instance-attribute` ¶

`description` `instance-attribute` ¶

`server_class = None` `class-attribute` `instance-attribute` ¶

`factory = None` `class-attribute` `instance-attribute` ¶

`estimated_latency_ms = 0.0` `class-attribute` `instance-attribute` ¶

`estimated_cost_usd = 0.0` `class-attribute` `instance-attribute` ¶

`estimated_energy_joules = 0.0` `class-attribute` `instance-attribute` ¶

`requires_api_key = None` `class-attribute` `instance-attribute` ¶

`requires_server = None` `class-attribute` `instance-attribute` ¶

`adp_domains = field(default_factory=list)` `class-attribute` `instance-attribute` ¶

`capabilities = field(default_factory=list)` `class-attribute` `instance-attribute` ¶

`ToolCategory` ¶

`ADPDomainServer` ¶

`get_registry(**kwargs)` ¶