Interacting with UI Elements

Once you’ve selected an element, you can interact with it using AskUI’s comprehensive set of actions and tools. This guide covers the different ways to interact with elements, from basic clicks to complex keyboard operations.

Basic Actions

The most common actions for interacting with elements:

from askui import VisionAgent

with VisionAgent() as agent:
    # Click actions
    agent.click("Login button")  # Single click
    agent.click("Submit", button="right")  # Right click
    agent.click("Open", button="middle")  # Middle click
    
    # Text input
    agent.type("myuser@example.com")  # Type text
    
    # Mouse movement
    agent.mouse_move("Menu item")  # Move to element
    agent.mouse_move(100, 200)  # Move to coordinates

Keyboard Operations

Control keyboard input for complex interactions:

with VisionAgent() as agent:
    # Basic keyboard input
    agent.keyboard("enter")  # Press Enter
    agent.keyboard("tab")  # Press Tab
    
    # Keyboard shortcuts
    agent.keyboard("a", modifier_keys=["control"])  # Ctrl+A
    agent.keyboard("c", modifier_keys=["control"])  # Ctrl+C
    agent.keyboard("v", modifier_keys=["control"])  # Ctrl+V

Supported Keyboard Keys

Modifier Keys

  • alt, control, shift, command
  • left_control, right_control
  • left_alt, right_alt
  • left_shift, right_shift
  • Arrow keys: up, down, left, right
  • home, end, pageup, pagedown
  • tab, enter, escape

Function Keys

  • f1 through f24 (except f10)

Media Keys

  • audio_mute, audio_vol_up, audio_vol_down
  • audio_play, audio_stop, audio_next, audio_prev

Built-in Tools

AskUI provides several built-in tools for different types of interactions:

OS Tools

Control operating system-level operations:

with VisionAgent() as agent:
    # Keyboard operations
    agent.tools.os.keyboard_tap("a", modifier_keys=["control"])
    agent.tools.os.keyboard_release()
    
    # Mouse operations
    agent.tools.os.mouse(100, 100)  # Move to coordinates
    agent.tools.os.click("left", 2)  # Double click

Web Browser Tools

Control web browser operations:

with VisionAgent() as agent:
    # Open new browser window
    agent.tools.webbrowser.open_new("https://askui.com")
    
    # Open in new tab
    agent.tools.webbrowser.open_new_tab("https://docs.askui.com")

Clipboard Tools

Manage clipboard operations:

with VisionAgent() as agent:
    # Copy and paste
    agent.tools.clipboard.copy("Text to copy")
    result = agent.tools.clipboard.paste()

Information Extraction

Use the get() method to extract information from the screen:

with VisionAgent() as agent:
    # Get text content
    url = agent.get("What is the current URL?")
    
    # Get structured data using response schema
    from askui import ResponseSchemaBase
    
    class UserInfo(ResponseSchemaBase):
        username: str
        is_online: bool
    
    user_info = agent.get(
        "What is the username and online status?",
        response_schema=UserInfo
    )

Best Practices

  1. Action Selection
    • Use the simplest action that accomplishes your goal
    • Consider element state before interacting
    • Handle potential errors gracefully
  2. Keyboard Operations
    • Use modifier keys for shortcuts
    • Release keys after use
    • Use appropriate key combinations
  3. Tool Usage
    • Choose the appropriate tool for the task
    • Combine tools when needed
    • Follow tool-specific best practices
  4. Information Extraction
    • Use response schemas for structured data
    • Be specific in your queries
    • Handle potential errors

By following these guidelines and using the appropriate actions and tools, you can create robust and reliable automation workflows with AskUI.