Skip to main content
In this tutorial, you’ll create your first AskUI agent that automates a real-world task: searching for products on Amazon.
Prerequisites: Make sure you’ve completed the installation before starting this tutorial.

What You’ll Build

You’ll create an agent that:
  1. Opens Amazon in a web browser
  2. Searches for a product
  3. Verifies the search results
  4. Generates a report of the automation

Building Your Agent

1

Create Your Agent Script

Create a new Python file amazon_shopping.py and add the following code:
  • Agent
  • Hybrid Agent
from askui import VisionAgent
from askui import locators as loc
from askui.reporting import SimpleHtmlReporter

# Initialize your agent with logging and reporting
with VisionAgent(
    reporters=[SimpleHtmlReporter()]
) as agent:
    agent.act("""
    You are an automated agent that performs desktop tasks.
              
    Steps:
    1. Open a web browser and navigate to "http://www.amazon.com".
    2. In the search bar, type "nike shoes" and press enter.    
    3. Check if the search results page displays Nike shoes.
    """)
Run the script:
python amazon_shopping.py
The script will:
  1. Open Amazon in your default browser
  2. Search for “nike shoes”
  3. Verify the cart contents
  4. Generate an HTML report of the automation
2

Understanding Your Code

Let’s break down what each part does:

Agent Initialization

with VisionAgent(reporters=[SimpleHtmlReporter()]) as agent:
  • Creates a vision agent that can see and interact with your screen
  • Enables debug logging to see what’s happening
  • Sets up HTML reporting to review the automation later

Browser Control

agent.tools.webbrowser.open_new("http://www.amazon.com")
agent.wait(3)
  • Opens a new browser window with Amazon
  • Waits for the page to load

Element Interaction

agent.click(loc.Element("textfield"))
agent.type("nike shoes")
agent.keyboard('enter')
  • Finds and clicks the search box
  • Types the search query
  • Presses Enter to search

Information Extraction

page_status = agent.get("Are Nike shoes visible on the screen?")
  • Uses AI to understand what’s on the screen
  • Returns a natural language response
3

View the Report

After running your agent, open the generated HTML report:
# The report will be in the same directory as your script
# Look for: report_YYYY-MM-DD_HH-MM-SS.html
The report shows:
  • Screenshots of each step
  • Actions performed
  • Execution time
  • Any errors encountered

Enhancing Your Agent

Try these modifications to learn more:

1. Add Product to Cart

# After searching, click on the first product
agent.click("first product image")
agent.wait(2)

# Add to cart
agent.click("Add to Cart button")

2. Use Different Selectors

# Using text locator
agent.click(loc.Text("Search"))

# Using relative positioning
search_icon = loc.Element().right_of(loc.Element("textfield"))
agent.click(search_icon)

3. Extract Product Information

from askui import ResponseSchemaBase

class ProductInfo(ResponseSchemaBase):
    name: str
    price: float
    rating: float

product = agent.get(
    "What is the name, price, and rating of the first product?",
    response_schema=ProductInfo
)
print(f"Found: {product.name} - ${product.price} ({product.rating} stars)")

Common Issues and Solutions

  • Check if you have a default browser set
  • Try using a specific browser path
  • Ensure AskUI Agent OS is running
  • Add wait times for dynamic content
  • Use more specific locators
  • Check if the element is visible on screen
  • Add agent.wait() between actions
  • Enable visual debugging with screenshots
  • Use SimpleHTMLReporter

What You’ve Learned

Congratulations! You’ve successfully:
  • ✅ Created your first AskUI agent
  • ✅ Automated browser interactions
  • ✅ Used AI to verify screen content
  • ✅ Generated automation reports

Next Steps