askui.ActModel

class ActModel(abc.ABC)

Abstract base class for models that can execute autonomous actions.

Models implementing this interface can be used with the VisionAgent.act().

Example:

from askui import (
    ActModel,
    MessageParam,
    OnMessageCb,
    VisionAgent,
)
from typing_extensions import override

class MyActModel(ActModel):
    @override
    def act(
        self,
        messages: list[MessageParam],
        model_choice: str,
        on_message: OnMessageCb | None = None,
    ) -> None:
        pass  # implement action logic here

with VisionAgent(models={"my-act": MyActModel()}) as agent:
    agent.act("search for flights", model="my-act")

askui.exceptions.AiElementNotFound

class AiElementNotFound(ValueError)

Exception raised when an AI element is not found.

Arguments:

  • name str - The name of the AI element that was not found.
  • locations list[pathlib.Path] - The locations that were searched for the AI element.

askui.exceptions.AskUiApiError

class AskUiApiError(Exception)

Base exception for AskUI API errors.

This exception is raised when there is an error communicating with the AskUI API. It serves as a base class for more specific API-related exceptions.

Arguments:

  • message str - The error message.

askui.exceptions.AskUiApiRequestFailedError

class AskUiApiRequestFailedError(AskUiApiError)

Exception raised when an API response is not as expected.

This exception is raised when the API returns a response that cannot be processed or indicates an error condition. It includes the HTTP status code and error message from the API response.

Arguments:

  • status_code int - The HTTP status code from the API response.
  • message str - The error message from the API response.

askui.exceptions.AutomationError

class AutomationError(Exception)

Exception raised when the automation step cannot complete.

Arguments:

  • message str - The error message.

askui.exceptions.ElementNotFoundError

class ElementNotFoundError(AutomationError)

Exception raised when an element cannot be located.

Arguments:

  • message str - The error message.

askui.exceptions.QueryNoResponseError

class QueryNoResponseError(AutomationError)

Exception raised when a query does not return a response.

Arguments:

  • message str - The error message.
  • query str - The query that was made.

askui.exceptions.QueryUnexpectedResponseError

class QueryUnexpectedResponseError(AutomationError)

Exception raised when a query returns an unexpected response.

Arguments:

  • message str - The error message.
  • query str - The query that was made.
  • response Any - The response that was received.

askui.exceptions.ModelNotFoundError

class ModelNotFoundError(AutomationError)

Exception raised when an invalid model is used.

Arguments:

  • model str | ModelComposition - The model that was used.
  • model_type Literal[“Act”, “Grounding (locate)”, “Query (get/extract)”] - The type of model that was used.

askui.exceptions.ModelTypeMismatchError

class ModelTypeMismatchError(ModelNotFoundError)

askui.GetModel

class GetModel(abc.ABC)

Abstract base class for models that can extract information from images.

Models implementing this interface can be used with the get() method of VisionAgent to extract information from screenshots or other images. These models analyze visual content and return structured or unstructured information based on queries.

Example:

from askui import GetModel, VisionAgent, ResponseSchema, ImageSource
from typing import Type

class MyGetModel(GetModel):
    def get(
        self,
        query: str,
        image: ImageSource,
        response_schema: Type[ResponseSchema] | None,
        model_choice: str,
    ) -> ResponseSchema | str:
        # Implement custom get logic
        return "Custom response"

with VisionAgent(models={"my-get": MyGetModel()}) as agent:
    result = agent.get("what's on screen?", model="my-get")

askui.ImageSource

class ImageSource(RootModel)

A Pydantic model that represents an image source and provides methods to convert it to different formats.

The model can be initialized with:

  • A PIL Image object
  • A file path (str or pathlib.Path)
  • A data URL string

Attributes:

  • root PILImage.Image - The underlying PIL Image object.

Arguments:

  • root Img - The image source to load from.

askui.Img

Img = Union[str, Path, PILImage.Image]

Type of the input images for askui.VisionAgent.get(), askui.VisionAgent.locate(), etc.

Accepts:

  • PIL.Image.Image
  • Relative or absolute file path (str or pathlib.Path)
  • Data URL (e.g., "data:image/png;base64,...")

askui.LocateModel

class LocateModel(abc.ABC)

Abstract base class for models that can locate UI elements in images.

Models implementing this interface can be used with the click(), locate(), and mouse_move() methods of VisionAgent to find UI elements on screen. These models analyze visual content to determine the coordinates of elements based on descriptions or locators.

Example:

from askui import LocateModel, VisionAgent, Locator, ImageSource, Point
from askui.models import ModelComposition

class MyLocateModel(LocateModel):
    def locate(
        self,
        locator: str | Locator,
        image: ImageSource,
        model_choice: ModelComposition | str,
    ) -> Point:
        # Implement custom locate logic
        return (100, 100)

with VisionAgent(models={"my-locate": MyLocateModel()}) as agent:
    agent.click("button", model="my-locate")

askui.Model

Model = ActModel | GetModel | LocateModel

Union type of all abstract model classes.

This type represents any model that can be used with VisionAgent, whether it’s an ActModel, GetModel, or LocateModel. It’s useful for type hints when you need to work with models in a generic way.

askui.ModelComposition

class ModelComposition(RootModel[list[ModelDefinition]])

A composition of models (list of ModelDefinition) to be used for a task, e.g., locating an element on the screen to be able to click on it or extracting text from an image.

askui.ModelDefinition

class ModelDefinition(BaseModel)

A definition of a model.

Arguments:

  • task str - The task the model is trained for, e.g., end-to-end OCR ("e2e_ocr") or object detection ("od")
  • architecture str - The architecture of the model, e.g., "easy_ocr" or "yolo"
  • version str - The version of the model
  • interface str - The interface the model is trained for, e.g., "online_learning"
  • use_case str, optional - The use case the model is trained for. In the case of workspace specific AskUI models, this is often the workspace id but with ”-” replaced by ”_”. Defaults to "00000000_0000_0000_0000_000000000000" (custom null value).
  • tags list[str], optional - Tags for identifying the model that cannot be represented by other properties, e.g., ["trained", "word_level"]

askui.ModelRegistry

ModelRegistry = dict[str, Model | Callable[[], Model]]

Type definition for model registry.

A dictionary mapping model names to either model instances or factory functions (for lazy initialization on first use) that create model instances. Used to register custom models with VisionAgent.

Example:

from askui import ModelRegistry, ActModel

class MyModel(ActModel):
    def act(self, goal: str, model_choice: str) -> None:
        pass

# Registry with model instance
registry: ModelRegistry = {
    "my-model": MyModel()
}

# Registry with model factory
def create_model() -> ActModel:
    return MyModel()

registry_with_factory: ModelRegistry = {
    "my-model": create_model
}

askui.models.ModelName

class ModelName(str, Enum)

Enumeration of all available model names in AskUI.

This enum provides type-safe access to model identifiers used throughout the library. Each model name corresponds to a specific AI model or model composition that can be used for different tasks like acting, getting information, or locating elements.

askui.models.OpenRouterGetModel

class OpenRouterGetModel(GetModel)

Get model for OpenRouter.

askui.models.OpenRouterSettings

class OpenRouterSettings(BaseModel)

Settings for OpenRouter.

askui.ModifierKey

ModifierKey = Literal["command", "alt", "control", "shift", "right_shift"]

Modifier keys for keyboard actions.

askui.PcKey

PcKey = Literal[
    "backspace",
    "delete",
    "enter",
    "tab",
    "escape",
    "up",
    "down",
    "right",
    "left",
    "home",
    "end",
    "pageup",
    "pagedown",
    "f1",
    "f2",
    "f3",
    "f4",
    "f5",
    "f6",
    "f7",
    "f8",
    "f9",
    "f10",
    "f11",
    "f12",
    "space",
    "0",
    "1",
    "2",
    "3",
    "4",
    "5",
    "6",
    "7",
    "8",
    "9",
    "a",
    "b",
    "c",
    "d",
    "e",
    "f",
    "g",
    "h",
    "i",
    "j",
    "k",
    "l",
    "m",
    "n",
    "o",
    "p",
    "q",
    "r",
    "s",
    "t",
    "u",
    "v",
    "w",
    "x",
    "y",
    "z",
    "A",
    "B",
    "C",
    "D",
    "E",
    "F",
    "G",
    "H",
    "I",
    "J",
    "K",
    "L",
    "M",
    "N",
    "O",
    "P",
    "Q",
    "R",
    "S",
    "T",
    "U",
    "V",
    "W",
    "X",
    "Y",
    "Z",
    "!",
    '"',
    "#",
    "$",
    "%",
    "&",
    "'",
    "(",
    ")",
    "*",
    "+",
    ",",
    "-",
    ".",
    "/",
    ":",
    ";",
    "<",
    "=",
    ">",
    "?",
    "@",
    "[",
    "\\",
    "]",
    "^",
    "_",
    "`",
    "{",
    "|",
    "}",
    "~",
]

PC keys for keyboard actions.

askui.models.Point

Point = tuple[int, int]

A tuple of two integers representing the coordinates of a point on the screen.

askui.ResponseSchema

ResponseSchema = TypeVar(
    'ResponseSchema', ResponseSchemaBase, str, bool, int, float
)

Type of the responses of data extracted, e.g., using askui.VisionAgent.get().

The following types are allowed:

  • ResponseSchemaBase: Custom Pydantic models that extend ResponseSchemaBase
  • str: String responses
  • bool: Boolean responses
  • int: Integer responses
  • float: Floating point responses

Usually, serialized as a JSON schema, e.g., str as {"type": "string"}, to be passed to model(s). Also used for validating the responses of the model(s) used for data extraction.

askui.ResponseSchemaBase

class ResponseSchemaBase(BaseModel)

Base class for response schemas to be used for defining the response of data extraction, e.g., using askui.VisionAgent.get().

This class extends Pydantic’s BaseModel and adds constraints and configuration on top so that it can be used with models to define the schema (type) of the data to be extracted.

Example:

class UrlResponse(ResponseSchemaBase):
    url: str

# nested models should also extend ResponseSchemaBase
class NestedResponse(ResponseSchemaBase):
    nested: UrlResponse

# metadata, e.g., `examples` or `description` of `Field`, is generally also passed to and considered by the models
class UrlResponse(ResponseSchemaBase):
    url: str = Field(description="The URL of the response. Should used `"https"` scheme.", examples=["https://www.example.com"])