Prerequisites: Complete Your First Agent tutorial before starting.
What You’ll Learn
This tutorial teaches you how to:- Open and control applications
- Handle dynamic text and buttons
- Work with forms and input fields
- Manage popups and overlays
- Use visual relationships for element selection
- Implement proper wait strategies
Tutorial Application
We’ll use the SauceDemo web application for our examples - a test e-commerce site perfect for learning automation patterns.1. Opening Applications
Learn different ways to launch applications and websites.Always add wait times after opening applications to ensure they’re fully loaded before interacting with them.
2. Clicking Text and Dynamic Elements
Handle text-based interactions with proper error handling and dynamic content.Basic Text Clicking
Handling Text Detection Issues
Text with Line Breaks
Text with Line Breaks
When text appears on multiple lines, use partial matching:
Merged or Overlapping Text
Merged or Overlapping Text
When overlay text merges with background text:
Missing Whitespace
Missing Whitespace
Handle text with inconsistent spacing:
3. Working with Icons and Buttons
Interact with visual elements beyond text.AI elements work best with:
- High color contrast against background
- Clear rectangular shapes
- Distinct visual properties
4. Form Filling and Text Input
Efficiently fill forms with structured data.5. Visual Relationships
Use spatial relationships to find elements precisely.Visual relationships are powerful for targeting elements in dynamic layouts:
- Directional:
above_of
,below_of
,left_of
,right_of
- Proximity:
nearest_to
- Container:
containing
,inside_of
- Logical:
and_
,or_
6. Wait Strategies
Implement proper waiting for reliable automation.7. Keyboard Shortcuts
Use keyboard shortcuts for efficient navigation.8. Handling Popups and Dynamic Content
Manage unexpected UI elements gracefully.Helper Functions
Since VisionAgent focuses on core functionality, here are useful helper functions for common patterns:Complete Example: E-commerce Purchase Flow
Here’s a complete automation combining all patterns:Best Practices Summary
Always Wait
Add appropriate waits after actions that trigger page changes
Use Fallbacks
Implement alternative locators when elements might vary
Handle Errors
Use try-except blocks for actions that might fail
Be Specific
Use visual relationships to target elements precisely
Troubleshooting Common Issues
Element not found
Element not found
- Increase wait times
- Use more specific locators
- Check if element is scrolled out of view
- Try AI elements for complex visuals
Text detection fails
Text detection fails
- Use
contains()
for partial matching - Try regex patterns for flexible matching
- Consider using AI elements as fallback
Popup interference
Popup interference
- Add escape key press at start
- Implement popup detection logic
- Use conditional clicks
Slow execution
Slow execution
- Reduce unnecessary waits
- Use batch operations where possible
- Enable parallel execution for independent tasks