Selecting the right UI elements is crucial for creating reliable automation workflows. This guide provides strategies to improve accuracy and performance when interacting with UI elements.

Example: Amazon Automation

from askui import VisionAgent

# Initialize your agent context manager
with VisionAgent() as agent:
    # Use the webbrowser tool to start browsing
    agent.tools.webbrowser.open_new("http://www.amazon.com")

    agent.wait(3)
    agent.click("Search Amazon.de inputfield")
    agent.type("nike shoes")
    agent.keyboard('enter')

    agent.act("""
    You are testing amazon.de for functionality and have searched for nike shoes.
    Simulate a online shopper:

    Look for the first option which is below Results and click on the first picture from the left

    """)

    agent.click("Add to Basket")

    agent.act("""
    You are testing amazon.de for functionality and have searched for nike shoes.
    Simulate a online shopper:

    Go to the basket
    """)

    basket_ver = agent.get("Is the shoe in the basket and is a value calculated?")
    print(basket_ver)

Natural Language Element Selection

AskUI’s Vision Agent uses natural language prompts to identify and interact with UI elements on your screen. This approach offers several advantages:

  • Intuitive Descriptions: Use everyday language to describe elements like “login button” or “search field”
  • Contextual Understanding: The agent understands elements based on their visual appearance and surrounding context
  • Flexibility: No need to learn complex selector syntax or DOM structures

How Natural Language Selection Works

When you provide a natural language prompt like “login button”, the Vision Agent:

  1. Captures the current screen state
  2. Analyzes the visual elements present
  3. Identifies the element that best matches your description
  4. Performs the requested action on that element

AI Elements

AI Elements allow you to capture and use specific visual elements from your screen for reliable automation.

Enable the AskUI Development Environment as described in the installation guide and activate experimental commands by running AskUI-ImportExperimentalCommands in your terminal.

AskUI-NewAIElement

This command captures elements from your screen for use in AskUI workflows.

It then can be used by setting the model config to askui-ai-element, like:

agent.click("ai-element-name", model_name="askui-ai-element")

Parameters:

  • Name (Optional): The name for the screenshot file. If defined, indicates only one element is being captured. For multiple elements, you’ll provide names individually.
  • WorkspaceId (Optional): Uses the Workspace ID from settings by default. Required if not in settings.
  • AlwaysPreview (Optional): Automatically opens preview without prompting. Cannot be used with NoPreview.
  • NoPreview (Optional): Skips the preview. Cannot be used with AlwaysPreview.
  • OneShot (Optional): Ends snipping after first successful AI Element creation.
  • Annotate (Optional): Takes a fullscreen capture for region annotation.

Example Output:

By implementing these techniques, you’ll build more reliable, flexible, and accurate automation workflows that can handle complex UI interactions with confidence.