Agent
askui.VisionAgent
A vision-based agent that can interact with user interfaces through computer vision and AI.
This agent can perform various UI interactions like clicking, typing, scrolling, and more. It uses computer vision models to locate UI elements and execute actions on them.
Arguments:
log_level
int | str, optional - The logging level to use. Defaults tologging.INFO
.display
int, optional - The display number to use for screen interactions. Defaults to1
.model_router
ModelRouter | None, optional - Custom model router instance. IfNone
, a default one will be created.reporters
list[Reporter] | None, optional - List of reporter instances for logging and reporting. IfNone
, an empty list is used.tools
AgentToolbox | None, optional - Custom toolbox instance. IfNone
, a default one will be created withAskUiControllerClient
.model
ModelComposition | str | None, optional - The default composition or name of the model(s) to be used for vision tasks. Can be overridden by themodel
parameter in theclick()
,get()
,act()
etc. methods.
Example:
act
Instructs the agent to achieve a specified goal through autonomous actions.
The agent will analyze the screen, determine necessary steps, and perform actions to accomplish the goal. This may include clicking, typing, scrolling, and other interface interactions.
Arguments:
goal
str - A description of what the agent should achieve.model
str | None, optional - The composition or name of the model(s) to be used for achieving thegoal
.
Example:
cli
Executes a command on the command line interface.
This method allows running shell commands directly from the agent. The command is split on spaces and executed as a subprocess.
Arguments:
command
str - The command to execute on the command line.
Example:
click
Simulates a mouse click on the user interface element identified by the provided locator.
Arguments:
locator
str | Locator | None, optional - The identifier or description of the element to click. IfNone
, clicks at current position.button
‘left’ | ‘middle’ | ‘right’, optional - Specifies which mouse button to click. Defaults to'left'
.repeat
int, optional - The number of times to click. Must be greater than0
. Defaults to1
.model
ModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element to click on using thelocator
.
Example:
get
Retrieves information from an image (defaults to a screenshot of the current screen) based on the provided query.
Arguments:
query
str - The query describing what information to retrieve.image
Img | None, optional - The image to extract information from. Defaults to a screenshot of the current screen. Can be a path to an image file, a PIL Image object or a data URL.response_schema
Type[ResponseSchema] | None, optional - A Pydantic model class that defines the response schema. If not provided, returns a string.model
str | None, optional - The composition or name of the model(s) to be used for retrieving information from the screen or image using thequery
. Note:response_schema
is not supported by all models.
Returns:
ResponseSchema | str: The extracted information, str
if no response_schema
is provided.
Limitations:
- Nested Pydantic schemas are not currently supported
- Schema support is only available with “askui” model (default model if
ASKUI_WORKSPACE_ID
andASKUI_TOKEN
are set) at the moment
Example:
locate
Locates the UI element identified by the provided locator.
Arguments:
locator
str | Locator - The identifier or description of the element to locate.screenshot
Img | None, optional - The screenshot to use for locating the element. Can be a path to an image file, a PIL Image object or a data URL. IfNone
, takes a screenshot of the currently selected display.model
ModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element using thelocator
.
Returns:
Point
- The coordinates of the element as a tuple (x, y).
Example:
key_down
Simulates the pressing of a key.
Arguments:
key
PcKey | ModifierKey - The key to be pressed.
Example:
key_up
Simulates the release of a key.
Arguments:
key
PcKey | ModifierKey - The key to be released.
Example:
keyboard
Simulates pressing (and releasing) a key or key combination on the keyboard.
Arguments:
key
PcKey | ModifierKey - The main key to press. This can be a letter, number, special character, or function key.modifier_keys
list[ModifierKey] | None, optional - List of modifier keys to press along with the main key. Common modifier keys include'ctrl'
,'alt'
,'shift'
.repeat
int, optional - The number of times to press (and release) the key. Must be greater than0
. Defaults to1
.
Example:
mouse_move
Moves the mouse cursor to the UI element identified by the provided locator.
Arguments:
locator
str | Locator - The identifier or description of the element to move to.model
ModelComposition | str | None, optional - The composition or name of the model(s) to be used for locating the element to move the mouse to using thelocator
.
Example:
mouse_scroll
Simulates scrolling the mouse wheel by the specified horizontal and vertical amounts.
Arguments:
x
int - The horizontal scroll amount. Positive values typically scroll right, negative values scroll left.y
int - The vertical scroll amount. Positive values typically scroll down, negative values scroll up.
Notes:
The actual scroll direction depends on the operating system’s configuration. Some systems may have “natural scrolling” enabled, which reverses the traditional direction.
The meaning of scroll units varies across operating systems and applications.
A scroll value of 10
might result in different distances depending on the application and system settings.
Example:
type
Types the specified text as if it were entered on a keyboard.
Arguments:
text
str - The text to be typed. Must be at least1
character long.
Example:
wait
Pauses the execution of the program for the specified number of seconds.
Arguments:
sec
float - The number of seconds to wait. Must be greater than0.0
.
Example: