In this tutorial, you’ll create your first AskUI agent that automates a real-world task: searching for products on Amazon.

Prerequisites: Make sure you’ve completed the installation before starting this tutorial.

What You’ll Build

You’ll create an agent that:

  1. Opens Amazon in a web browser
  2. Searches for a product
  3. Verifies the search results
  4. Generates a report of the automation

Building Your Agent

1

Create Your Agent Script

Create a new Python file amazon_shopping.py and add the following code:

from askui import VisionAgent
import logging
from askui import locators as loc
from askui.reporting import SimpleHtmlReporter

# Initialize your agent with logging and reporting
with VisionAgent(
    log_level=logging.DEBUG,
    reporters=[SimpleHtmlReporter()]
) as agent:
    # Open Amazon website
    agent.tools.webbrowser.open_new("http://www.amazon.com")
    agent.wait(3)  # Wait for page to load

    # Search for a product
    agent.click(loc.Element("textfield"))
    agent.type("nike shoes")
    agent.keyboard('enter')
    agent.wait(2)  # Wait for search results

    # Verify page contents
    page_status = agent.get("Are Nike shoes visible on the screen?")
    print(f"Cart Status: {page_status}")

Run the script:

python amazon_shopping.py

The script will:

  1. Open Amazon in your default browser
  2. Search for “nike shoes”
  3. Verify the cart contents
  4. Generate an HTML report of the automation
2

Understanding Your Code

Let’s break down what each part does:

Agent Initialization

with VisionAgent(log_level=logging.DEBUG, reporters=[SimpleHtmlReporter()]) as agent:
  • Creates a vision agent that can see and interact with your screen
  • Enables debug logging to see what’s happening
  • Sets up HTML reporting to review the automation later

Browser Control

agent.tools.webbrowser.open_new("http://www.amazon.com")
agent.wait(3)
  • Opens a new browser window with Amazon
  • Waits for the page to load

Element Interaction

agent.click(loc.Element("textfield"))
agent.type("nike shoes")
agent.keyboard('enter')
  • Finds and clicks the search box
  • Types the search query
  • Presses Enter to search

Information Extraction

page_status = agent.get("Are Nike shoes visible on the screen?")
  • Uses AI to understand what’s on the screen
  • Returns a natural language response
3

View the Report

After running your agent, open the generated HTML report:

# The report will be in the same directory as your script
# Look for: report_YYYY-MM-DD_HH-MM-SS.html

The report shows:

  • Screenshots of each step
  • Actions performed
  • Execution time
  • Any errors encountered

Enhancing Your Agent

Try these modifications to learn more:

1. Add Product to Cart

# After searching, click on the first product
agent.click("first product image")
agent.wait(2)

# Add to cart
agent.click("Add to Cart button")

2. Use Different Selectors

# Using text locator
agent.click(loc.Text("Search"))

# Using relative positioning
search_icon = loc.Element().right_of(loc.Element("textfield"))
agent.click(search_icon)

3. Extract Product Information

from askui import ResponseSchemaBase

class ProductInfo(ResponseSchemaBase):
    name: str
    price: float
    rating: float

product = agent.get(
    "What is the name, price, and rating of the first product?",
    response_schema=ProductInfo
)
print(f"Found: {product.name} - ${product.price} ({product.rating} stars)")

Common Issues and Solutions

What You’ve Learned

Congratulations! You’ve successfully:

  • ✅ Created your first AskUI agent
  • ✅ Automated browser interactions
  • ✅ Used AI to verify screen content
  • ✅ Generated automation reports

Next Steps