displayint, optional - The display number to use for screen interactions. Defaults to1.reporterslist[Reporter] | None, optional - List of reporter instances for logging and reporting. IfNone, an empty list is used.toolsAgentToolbox | None, optional - Custom toolbox instance. IfNone, a default one will be created withAskUiControllerClient.modelModelChoice | ModelComposition | str | None, optional - The default choice or name of the model(s) to be used for vision tasks. Can be overridden by themodelparameter in theclick(),get(),act()etc. methods.retryRetry, optional - The retry instance to use for retrying failed actions. Defaults toConfigurableRetrywith exponential backoff. Currently only supported forlocate()method.modelsModelRegistry | None, optional - A registry of models to make available to theVisionAgentso that they can be selected using themodelparameter ofVisionAgentor themodelparameter of itsclick(),get(),act()etc. methods. Entries in the registry override entries in the default model registry.
click
locatorstr | Locator | Point | None, optional - UI element description, structured locator, or absolute coordinates (x, y). IfNone, clicks at current position.button‘left’ | ‘middle’ | ‘right’, optional - Specifies which mouse button to click. Defaults to'left'.repeatint, optional - The number of times to click. Must be greater than0. Defaults to1.offsetPoint | None, optional - Pixel offset (x, y) from the target location. Positive x=right, negative x=left, positive y=down, negative y=up.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element to click on using thelocator.
mouse_move
locatorstr | Locator | Point - UI element description, structured locator, or absolute coordinates (x, y).offsetPoint | None, optional - Pixel offset (x, y) from the target location. Positive x=right, negative x=left, positive y=down, negative y=up.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element to move the mouse to using thelocator.
mouse_scroll
xint - The horizontal scroll amount. Positive values typically scroll right, negative values scroll left.yint - The vertical scroll amount. Positive values typically scroll down, negative values scroll up.
10 might result in different distances depending on the application and system settings.
Example:
type
locator is provided, it will first click on the element to give it focus before typing.
If clear is True (default), it will triple click on the element to select the current text (in multi-line inputs like textareas the current line or paragraph) before typing.
IMPORTANT: clear only works if a locator is provided.
Arguments:
textstr - The text to be typed. Must be at least1character long.locatorstr | Locator | Point | None, optional - UI element description, structured locator, or absolute coordinates (x, y). IfNone, types at current focus.offsetPoint | None, optional - Pixel offset (x, y) from the target location. Positive x=right, negative x=left, positive y=down, negative y=up.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element, i.e., input field, to type into using thelocator.clearbool, optional - Whether to triple click on the element to give it focus and select the current text before typing. Defaults toTrue.
key_up
keyPcKey | ModifierKey - The key to be released.
key_down
keyPcKey | ModifierKey - The key to be pressed.
mouse_up
button‘left’ | ‘middle’ | ‘right’, optional - The mouse button to be released. Defaults to'left'.
mouse_down
button‘left’ | ‘middle’ | ‘right’, optional - The mouse button to be pressed. Defaults to'left'.
keyboard
keyPcKey | ModifierKey - The main key to press. This can be a letter, number, special character, or function key.modifier_keyslist[ModifierKey] | None, optional - List of modifier keys to press along with the main key. Common modifier keys include'ctrl','alt','shift'.repeatint, optional - The number of times to press (and release) the key. Must be greater than0. Defaults to1.
cli
commandstr - The command to execute on the command line.
act
goalstr | list[MessageParam] - A description of what the agent should achieve.modelstr | None, optional - The composition or name of the model(s) to be used for achieving thegoal.on_messageOnMessageCb | None, optional - Callback for new messages. If it returnsNone, stops and does not add the message.toolslist[Tool] | ToolCollection | None, optional - The tools for the agent. Defaults to default tools depending on the selected model.settingsAgentSettings | None, optional - The settings for the agent. Defaults to a default settings depending on the selected model.
MaxTokensExceededError- If the model reaches the maximum token limit defined in the agent settings.ModelRefusalError- If the model refuses to process the request.
get
query.
If no source is provided, a screenshot of the current screen is taken.
Arguments:
querystr - The query describing what information to retrieve.sourceInputSource | None, optional - The source to extract information from. Can be a path to an image, PDF, or office document file, a PIL Image object or a data URL. Defaults to a screenshot of the current screen.response_schemaType[ResponseSchema] | None, optional - A Pydantic model class that defines the response schema. If not provided, returns a string.modelstr | None, optional - The composition or name of the model(s) to be used for retrieving information from the screen or image using thequery. Note:response_schemais not supported by all models. PDF processing is only supported for Gemini models hosted on AskUI.
str if no
response_schema is provided.
Raises:
NotImplementedError- If PDF processing is not supported for the selected model.ValueError- If thesourceis not a valid PDF or image.
locate
locatorstr | Locator - The identifier or description of the element to locate.screenshotInputSource | None, optional - The screenshot to use for locating the element. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using thelocator.
Point- The coordinates of the element as a tuple (x, y).
locate_all
locatorstr | Locator - The identifier or description of the element to locate.screenshotInputSource | None, optional - The screenshot to use for locating the element. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using thelocator.
PointList- The coordinates of the elements as a list of tuples (x, y).
locate_all_elements
screenshotInputSource | None, optional - The screenshot to use for locating the elements. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | None, optional - The model composition to be used for locating the elements.
list[DetectedElement]- A list of detected elements
annotate
-
screenshotImageSource | None, optional - The screenshot to annotate. IfNone, takes a screenshot of the currently selected display. -
annotation_dirstr - The directory to save the annotated image. Defaults to “annotations”. -
modelModelComposition | None, optional - The composition of the model(s) to be used for annotating the image. IfNone, uses the default model. Example Using VisionAgent:Example Using AndroidVisionAgent:Example Using VisionAgent with custom screenshot and annotation directory:
wait
untilfloat | str | Locator - If a float, pauses execution for the specified number of seconds (must be greater than 0.0). If a string or Locator, waits until the specified UI element appears or disappears on screen.retry_countint | None - Number of retries when waiting for a UI element. Defaults to 3 if None.delayint | None - Sleep duration in seconds between retries when waiting for a UI element. Defaults to 1 second if None.until_conditionLiteral[“appear”, “disappear”] - The condition to wait until the element satisfies. Defaults to “appear”.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using theuntillocator.
WaitUntilError- If the UI element is not found after all retries.