reporterslist[Reporter] | None, optional - List of reporter instances for logging and reporting. IfNone, an empty list is used.modelModelChoice | ModelComposition | str | None, optional - The default choice or name of the model(s) to be used for vision tasks. Can be overridden by themodelparameter in thetap(),get(),act()etc. methods.retryRetry, optional - The retry instance to use for retrying failed actions. Defaults toConfigurableRetrywith exponential backoff. Currently only supported forlocate()method.modelsModelRegistry | None, optional - A registry of models to make available to theAndroidVisionAgentso that they can be selected using themodelparameter ofAndroidVisionAgentor themodelparameter of itstap(),get(),act()etc. methods. Entries in the registry override entries in the default model registry.model_providerstr | None, optional - The model provider to use for vision tasks.
tap
targetstr | Locator | Point - The target to tap on. Can be a locator, a point, or a string.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for tapping on the target.
type
textstr - The text to be typed. Must be at least1character long. Only ASCII printable characters are supported. other characters will raise an error.
key_tap
keyANDROID_KEY - The key to tap.
key_combination
keyslist[ANDROID_KEY] - The keys to tap.duration_in_msint, optional - The duration in milliseconds to hold the key combination. Default is 100ms.
shell
commandstr - The shell command to execute.
drag_and_drop
x1int - The x-coordinate of the starting point.y1int - The y-coordinate of the starting point.x2int - The x-coordinate of the ending point.y2int - The y-coordinate of the ending point.duration_in_msint, optional - The duration in milliseconds to hold the drag and drop. Default is 1000ms.
swipe
x1int - The x-coordinate of the starting point.y1int - The y-coordinate of the starting point.x2int - The x-coordinate of the ending point.y2int - The y-coordinate of the ending point.duration_in_msint, optional - The duration in milliseconds to hold the swipe. Default is 1000ms.
set_device_by_serial_number
device_snstr - The serial number of the device to set as active.
act
goalstr | list[MessageParam] - A description of what the agent should achieve.modelstr | None, optional - The composition or name of the model(s) to be used for achieving thegoal.on_messageOnMessageCb | None, optional - Callback for new messages. If it returnsNone, stops and does not add the message.toolslist[Tool] | ToolCollection | None, optional - The tools for the agent. Defaults to default tools depending on the selected model.settingsAgentSettings | None, optional - The settings for the agent. Defaults to a default settings depending on the selected model.
MaxTokensExceededError- If the model reaches the maximum token limit defined in the agent settings.ModelRefusalError- If the model refuses to process the request.
get
query.
If no source is provided, a screenshot of the current screen is taken.
Arguments:
querystr - The query describing what information to retrieve.sourceInputSource | None, optional - The source to extract information from. Can be a path to an image, PDF, or office document file, a PIL Image object or a data URL. Defaults to a screenshot of the current screen.response_schemaType[ResponseSchema] | None, optional - A Pydantic model class that defines the response schema. If not provided, returns a string.modelstr | None, optional - The composition or name of the model(s) to be used for retrieving information from the screen or image using thequery. Note:response_schemais not supported by all models. PDF processing is only supported for Gemini models hosted on AskUI.
str if no
response_schema is provided.
Raises:
NotImplementedError- If PDF processing is not supported for the selected model.ValueError- If thesourceis not a valid PDF or image.
locate
locatorstr | Locator - The identifier or description of the element to locate.screenshotInputSource | None, optional - The screenshot to use for locating the element. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using thelocator.
Point- The coordinates of the element as a tuple (x, y).
locate_all
locatorstr | Locator - The identifier or description of the element to locate.screenshotInputSource | None, optional - The screenshot to use for locating the element. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using thelocator.
PointList- The coordinates of the elements as a list of tuples (x, y).
locate_all_elements
screenshotInputSource | None, optional - The screenshot to use for locating the elements. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | None, optional - The model composition to be used for locating the elements.
list[DetectedElement]- A list of detected elements
annotate
-
screenshotImageSource | None, optional - The screenshot to annotate. IfNone, takes a screenshot of the currently selected display. -
annotation_dirstr - The directory to save the annotated image. Defaults to “annotations”. -
modelModelComposition | None, optional - The composition of the model(s) to be used for annotating the image. IfNone, uses the default model. Example Using AndroidVisionAgent:Example Using AndroidVisionAgent:Example Using AndroidVisionAgent with custom screenshot and annotation directory:
wait
untilfloat | str | Locator - If a float, pauses execution for the specified number of seconds (must be greater than 0.0). If a string or Locator, waits until the specified UI element appears or disappears on screen.retry_countint | None - Number of retries when waiting for a UI element. Defaults to 3 if None.delayint | None - Sleep duration in seconds between retries when waiting for a UI element. Defaults to 1 second if None.until_conditionLiteral[“appear”, “disappear”] - The condition to wait until the element satisfies. Defaults to “appear”.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using theuntillocator.
WaitUntilError- If the UI element is not found after all retries.