AskUI is a powerful UI automation framework that combines computer vision, natural language processing, and AI to interact with user interfaces. Here’s how it works:

1. Vision Agent

The Vision Agent is the main interface for interacting with UI elements. It combines several technologies:

Computer Vision: Analyzes screen content to identify UI elements
Natural Language Processing: Understands human-like commands
AI Models: Makes intelligent decisions about UI interactions

from askui import VisionAgent

# Initialize the agent with optional configuration
with VisionAgent(
    log_level="INFO",  # Set logging level
    display=1,         # Use first display
    model="askui"      # Use default model
) as agent:
    # Your automation code here
    agent.click("login button")

2. Element Selection

AskUI uses multiple approaches to find and interact with UI elements:

Natural Language: Describe elements in plain English
```
agent.click("submit button")
agent.type("john.doe")
```

Locators: Build precise element selectors

from askui import locators as loc

# Find by text
agent.click(loc.Text("Submit"))

# Find by element type
agent.click(loc.Element("textfield"))  # Find a text field
agent.click(loc.Element("text"))       # Find a text element

# Find by image
agent.click(loc.Image("logo.png"))

# Use relative locators
password_label = loc.Text("Password")
password_field = loc.Element().right_of(password_label)

# Find text below a heading
submit_text = loc.Element().below_of(loc.Text("Complete Registration"))

# Find an element near another element
menu_item = loc.Text("Settings").nearest_to(loc.Image("user-icon.png"))

AI Elements: Capture and reuse specific visual elements

# Optional, if you are not in the askui-shell (ADE)
askui-shell

# Capture elements from your screen
AskUI-NewAIElement

from askui import locators as loc

with VisionAgent() as agent:
    agent.click(loc.AiElement("my-element-name"))

3. Interaction Methods

AskUI provides various ways to interact with UI elements:

Basic Actions

# Click actions
agent.click("button")                    # Left click
agent.click("button", button="right")    # Right click
agent.click("button", repeat=2)          # Double click

# Keyboard actions
agent.type("input field", "text")        # Type text
agent.keyboard('enter')                  # Press Enter
agent.key_down('shift')                  # Hold Shift
agent.key_up('shift')                    # Release Shift

# Mouse actions
agent.mouse_move(100, 200)              # Move to coordinates
agent.mouse_scroll(0, 10)               # Scroll down

Complex Actions

agent.act("""
You are testing a login form.
Enter the username "john.doe" and password "secret123"
Then click the login button
""")

Information Extraction

# Get text content
text = agent.get("What is the error message?")

# Get element location
point = agent.locate("submit button")
print(f"Element found at: {point}")

4. Tools and Utilities

AskUI includes built-in tools for common tasks:

Web Browser Control

agent.tools.webbrowser.open_new("https://example.com")

OS Operations

agent.tools.os.copy_to_clipboard("text")
agent.tools.os.paste_from_clipboard()

Multi-Monitor Support

# Set which display to use
agent.tools.os.set_display(2)  # Use second display

Next Steps

Learn about the Workflow
Explore Best Practices
Read about AI Models

Documentation

Tutorial

How-to Guides

Understanding AskUI

Core Components

1. Vision Agent

2. Element Selection

3. Interaction Methods

4. Tools and Utilities

Next Steps

Documentation

Tutorial

How-to Guides

Understanding AskUI

​1. Vision Agent

​2. Element Selection

​3. Interaction Methods

​4. Tools and Utilities

​Next Steps

1. Vision Agent

2. Element Selection

3. Interaction Methods

4. Tools and Utilities

Next Steps