Workflow

AskUI follows a systematic workflow to process and execute UI automation tasks:

1. Initialization

  • Load AI models: The selected AI models are loaded into memory
  • Set up logging and reporting: Configure logging levels and reporting mechanisms
  • Configure environment: Set display, authentication, and other environment settings
from askui import VisionAgent

# Initialization happens when creating the agent
with VisionAgent(
    log_level="INFO",
    display=1,
    model="askui"
) as agent:
    # Agent is now initialized and ready
    pass

2. Element Detection

  • Capture screen content: Take a screenshot of the current display
  • Analyze UI elements: Process the screenshot through AI models to identify elements
  • Match elements to commands: Map natural language descriptions to detected elements
# Element detection happens internally when you interact
agent.click("submit button")  # AskUI captures screen and finds the button

3. Action Execution

  • Plan the interaction: Determine the best way to interact with the element
  • Perform the action: Execute the planned interaction (click, type, etc.)
  • Verify the result: Check if the action was successful
# Action execution with verification
agent.click("save button")
# AskUI automatically verifies the click was successful

4. Error Handling

  • Detect failures: Identify when actions fail or elements can’t be found
  • Retry if needed: Automatically retry failed actions based on configuration
  • Report issues: Log errors and generate reports for debugging
# Built-in error handling with retry
from askui import retry

@retry(max_attempts=3, wait_time=2)
def login():
    agent.type("username field", "john.doe")
    agent.type("password field", "secret123")
    agent.click("login button")

Execution Flow

Here’s a visual representation of how AskUI processes a command:

  1. User Commandagent.click("submit button")
  2. Screen Capture → Takes screenshot of current display
  3. AI Analysis → Identifies all UI elements on screen
  4. Element Matching → Finds the “submit button”
  5. Action Planning → Determines click coordinates
  6. Execution → Performs the click action
  7. Verification → Confirms action was successful
  8. Result → Returns control to user code

Best Practices for Workflow

  1. Use context managers to ensure proper initialization and cleanup
  2. Add appropriate wait times between actions for dynamic content
  3. Implement retry logic for critical actions
  4. Monitor performance using logging and reporting features
  5. Handle errors gracefully with try-except blocks

Next Steps