base
ipw.clients.base
¶
InferenceClient
¶
Bases: ABC
Base class for inference service integrations.
Subclasses must be registered with ClientRegistry to become discoverable.
Source code in intelligence-per-watt/src/ipw/clients/base.py
stream_chat_completion(model, prompt, **params)
abstractmethod
¶
Run a streamed chat completion and return the aggregated response.
Implementations should consume a streaming API so they can measure
time-to-first-token latency, but must return a fully materialized
Response object once the stream finishes. prompt contains the
raw text to submit to the inference service.
Source code in intelligence-per-watt/src/ipw/clients/base.py
list_models()
abstractmethod
¶
health()
abstractmethod
¶
prepare(model)
¶
chat(system_prompt, user_prompt, *, temperature=None, max_output_tokens=None)
¶
Synchronous chat completion helper.
Implementations should return the generated text for the given prompts. Subclasses that don't support chat may rely on this default, which raises to signal the capability is unavailable.