AskUI is a powerful UI automation framework that combines computer vision, natural language processing, and AI to interact with user interfaces. Here’s how it works:
1. Vision Agent
The Vision Agent is the main interface for interacting with UI elements. It combines several technologies:
- Computer Vision: Analyzes screen content to identify UI elements
- Natural Language Processing: Understands human-like commands
- AI Models: Makes intelligent decisions about UI interactions
from askui import VisionAgent
# Initialize the agent with optional configuration
with VisionAgent(
log_level="INFO", # Set logging level
display=1, # Use first display
model="askui" # Use default model
) as agent:
# Your automation code here
agent.click("login button")
2. Element Selection
AskUI uses multiple approaches to find and interact with UI elements:
-
Natural Language: Describe elements in plain English
agent.click("submit button")
agent.type("john.doe")
-
Locators: Build precise element selectors
from askui import locators as loc
# Find by text
agent.click(loc.Text("Submit"))
# Find by element type
agent.click(loc.Element("textfield")) # Find a text field
agent.click(loc.Element("text")) # Find a text element
# Find by image
agent.click(loc.Image("logo.png"))
# Use relative locators
password_label = loc.Text("Password")
password_field = loc.Element().right_of(password_label)
# Find text below a heading
submit_text = loc.Element().below_of(loc.Text("Complete Registration"))
# Find an element near another element
menu_item = loc.Text("Settings").nearest_to(loc.Image("user-icon.png"))
-
AI Elements: Capture and reuse specific visual elements
# Optional, if you are not in the askui-shell (ADE)
askui-shell
# Capture elements from your screen
AskUI-NewAIElement
from askui import locators as loc
with VisionAgent() as agent:
agent.click(loc.AiElement("my-element-name"))
3. Interaction Methods
AskUI provides various ways to interact with UI elements:
-
Basic Actions
# Click actions
agent.click("button") # Left click
agent.click("button", button="right") # Right click
agent.click("button", repeat=2) # Double click
# Keyboard actions
agent.type("input field", "text") # Type text
agent.keyboard('enter') # Press Enter
agent.key_down('shift') # Hold Shift
agent.key_up('shift') # Release Shift
# Mouse actions
agent.mouse_move(100, 200) # Move to coordinates
agent.mouse_scroll(0, 10) # Scroll down
-
Complex Actions
agent.act("""
You are testing a login form.
Enter the username "john.doe" and password "secret123"
Then click the login button
""")
-
Information Extraction
# Get text content
text = agent.get("What is the error message?")
# Get element location
point = agent.locate("submit button")
print(f"Element found at: {point}")
AskUI includes built-in tools for common tasks:
-
Web Browser Control
agent.tools.webbrowser.open_new("https://example.com")
-
OS Operations
agent.tools.os.copy_to_clipboard("text")
agent.tools.os.paste_from_clipboard()
-
Multi-Monitor Support
# Set which display to use
agent.tools.os.set_display(2) # Use second display
Next Steps