Skip to main content
In this tutorial, you’ll create your first AskUI agent that automates a real-world task: searching for products on Amazon.
Prerequisites: Make sure you’ve completed the installation before starting this tutorial.

What You’ll Build

You’ll create an agent that:
  1. Opens Amazon in a web browser
  2. Searches for a product
  3. Verifies the search results
  4. Generates a report of the automation

Building Your Agent

1

Create Your Agent Script

Create a new Python file amazon_shopping.py and add the following code:
from askui import VisionAgent
from askui import locators as loc
from askui.reporting import SimpleHtmlReporter

# Initialize your agent with logging and reporting
with VisionAgent(
    reporters=[SimpleHtmlReporter()]
) as agent:
    agent.act("""
    You are an automated agent that performs desktop tasks.
              
    Steps:
    1. Open a web browser and navigate to "http://www.amazon.com".
    2. In the search bar, type "nike shoes" and press enter.    
    3. Check if the search results page displays Nike shoes.
    """)
Run the script:
python amazon_shopping.py
The script will:
  1. Open Amazon in your default browser
  2. Search for “nike shoes”
  3. Verify the cart contents
  4. Generate an HTML report of the automation
2

Understanding Your Code

Let’s break down what each part does:

Agent Initialization

with VisionAgent(reporters=[SimpleHtmlReporter()]) as agent:
  • Creates a vision agent that can see and interact with your screen
  • Enables debug logging to see what’s happening
  • Sets up HTML reporting to review the automation later

Browser Control

agent.tools.webbrowser.open_new("http://www.amazon.com")
agent.wait(3)
  • Opens a new browser window with Amazon
  • Waits for the page to load

Element Interaction

agent.click(loc.Element("textfield"))
agent.type("nike shoes")
agent.keyboard('enter')
  • Finds and clicks the search box
  • Types the search query
  • Presses Enter to search

Information Extraction

page_status = agent.get("Are Nike shoes visible on the screen?")
  • Uses AI to understand what’s on the screen
  • Returns a natural language response
3

View the Report

After running your agent, open the generated HTML report:
# The report will be in the same directory as your script
# Look for: report_YYYY-MM-DD_HH-MM-SS.html
The report shows:
  • Screenshots of each step
  • Actions performed
  • Execution time
  • Any errors encountered

Enhancing Your Agent

Try these modifications to learn more:

1. Add Product to Cart

# After searching, click on the first product
agent.click("first product image")
agent.wait(2)

# Add to cart
agent.click("Add to Cart button")

2. Use Different Selectors

# Using text locator
agent.click(loc.Text("Search"))

# Using relative positioning
search_icon = loc.Element().right_of(loc.Element("textfield"))
agent.click(search_icon)

3. Extract Product Information

from askui import ResponseSchemaBase

class ProductInfo(ResponseSchemaBase):
    name: str
    price: float
    rating: float

product = agent.get(
    "What is the name, price, and rating of the first product?",
    response_schema=ProductInfo
)
print(f"Found: {product.name} - ${product.price} ({product.rating} stars)")

Common Issues and Solutions

  • Check if you have a default browser set
  • Try using a specific browser path
  • Ensure AskUI Agent OS is running
  • Add wait times for dynamic content
  • Use more specific locators
  • Check if the element is visible on screen
  • Add agent.wait() between actions
  • Enable visual debugging with screenshots
  • Use SimpleHTMLReporter

What You’ve Learned

Congratulations! You’ve successfully:
  • ✅ Created your first AskUI agent
  • ✅ Automated browser interactions
  • ✅ Used AI to verify screen content
  • ✅ Generated automation reports

Next Steps

Element Selection

Learn advanced techniques for finding and selecting UI elements

Data Extraction

Extract structured data from any UI

Configure AI Models

Use different AI models for specific tasks

Best Practices

Learn patterns for reliable automation