askui.VisionAgent
log_levelint | str, optional - The logging level to use. Defaults tologging.INFO.displayint, optional - The display number to use for screen interactions. Defaults to1.reporterslist[Reporter] | None, optional - List of reporter instances for logging and reporting. IfNone, an empty list is used.toolsAgentToolbox | None, optional - Custom toolbox instance. IfNone, a default one will be created withAskUiControllerClient.modelModelChoice | ModelComposition | str | None, optional - The default choice or name of the model(s) to be used for vision tasks. Can be overridden by themodelparameter in theclick(),get(),act()etc. methods.retryRetry, optional - The retry instance to use for retrying failed actions. Defaults toConfigurableRetrywith exponential backoff. Currently only supported forlocate()method.modelsModelRegistry | None, optional - A registry of models to make available to theVisionAgentso that they can be selected using themodelparameter ofVisionAgentor themodelparameter of itsclick(),get(),act()etc. methods. Entries in the registry override entries in the default model registry.
act
goalstr - A description of what the agent should achieve.modelstr | None, optional - The composition or name of the model(s) to be used for achieving thegoal.on_messageOnMessageCb | None, optional - Callback for new messages. If it returnsNone, stops and does not add the message.
cli
commandstr - The command to execute on the command line.
click
locatorstr | Locator | None, optional - The identifier or description of the element to click. IfNone, clicks at current position.button‘left’ | ‘middle’ | ‘right’, optional - Specifies which mouse button to click. Defaults to'left'.repeatint, optional - The number of times to click. Must be greater than0. Defaults to1.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element to click on using thelocator.
get
querystr - The query describing what information to retrieve.imageImg | None, optional - The image to extract information from. Defaults to a screenshot of the current screen. Can be a path to an image file, a PIL Image object or a data URL.response_schemaType[ResponseSchema] | None, optional - A Pydantic model class that defines the response schema. If not provided, returns a string.modelstr | None, optional - The composition or name of the model(s) to be used for retrieving information from the screen or image using thequery. Note:response_schemais not supported by all models.
str if no response_schema is provided.
Limitations:
- Nested Pydantic schemas are not currently supported
- Schema support is only available with “askui” model (default model if
ASKUI_WORKSPACE_IDandASKUI_TOKENare set) at the moment
key_down
keyPcKey | ModifierKey - The key to be pressed.
key_up
keyPcKey | ModifierKey - The key to be released.
keyboard
keyPcKey | ModifierKey - The main key to press. This can be a letter, number, special character, or function key.modifier_keyslist[ModifierKey] | None, optional - List of modifier keys to press along with the main key. Common modifier keys include'ctrl','alt','shift'.repeatint, optional - The number of times to press (and release) the key. Must be greater than0. Defaults to1.
locate
locatorstr | Locator - The identifier or description of the element to locate.screenshotImg | None, optional - The screenshot to use for locating the element. Can be a path to an image file, a PIL Image object or a data URL. IfNone, takes a screenshot of the currently selected display.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using thelocator.
Point- The coordinates of the element as a tuple (x, y).
mouse_down
button‘left’ | ‘middle’ | ‘right’, optional - The mouse button to be pressed. Defaults to'left'.
mouse_move
locatorstr | Locator - The identifier or description of the element to move to.modelModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element to move the mouse to using thelocator.
mouse_scroll
xint - The horizontal scroll amount. Positive values typically scroll right, negative values scroll left.yint - The vertical scroll amount. Positive values typically scroll down, negative values scroll up.
10 might result in different distances depending on the application and system settings.
Example:
mouse_up
button‘left’ | ‘middle’ | ‘right’, optional - The mouse button to be released. Defaults to'left'.
type
textstr - The text to be typed. Must be at least1character long.
wait
secfloat - The number of seconds to wait. Must be greater than0.0.