When interacting with UI elements in AskUI, following these best practices ensures robust and reliable automation workflows.

Action Selection

Choose the Right Action

  • Use the simplest action that accomplishes your goal
    • Prefer click() over complex mouse operations when possible
    • Use type() for text input instead of individual keyboard events
    • Choose native methods over tool-based approaches when available

Consider Element State

  • Verify element visibility before interacting
  • Check if elements are enabled before clicking
  • Account for loading states and dynamic content
  • Handle overlapping elements by being specific with locators

Error Handling

from askui import VisionAgent, retry

# Use retry decorator for critical actions
@retry(max_attempts=3, wait_time=2)
def click_critical_button(agent):
    agent.click("Submit Order")

# Handle specific error cases
with VisionAgent() as agent:
    try:
        agent.click("Save")
    except Exception as e:
        # Log error and take alternative action
        agent.get("What error message is displayed?")

Keyboard Operations

Modifier Key Management

  • Always release modifier keys after use to prevent stuck keys
  • Use context managers when possible for automatic cleanup
  • Combine modifiers properly for complex shortcuts
# Good: Automatic cleanup
with VisionAgent() as agent:
    agent.keyboard("a", modifier_keys=["control"])
    
# Also good: Explicit key management
agent.key_down("control")
agent.keyboard("a")
agent.key_up("control")

Key Combination Best Practices

  • Use standard shortcuts that work across platforms
  • Allow time between keystrokes for complex sequences
  • Test keyboard operations across different OS environments

Tool Usage Guidelines

Tool Selection

  1. OS Tools: Use for low-level system operations
  2. Browser Tools: Use for web-specific tasks
  3. Clipboard Tools: Use for cross-application data transfer
  4. Native Methods: Prefer these for standard interactions

Tool Combination

# Example: Combining tools effectively
with VisionAgent() as agent:
    # Copy from one application
    agent.click("data field")
    agent.keyboard("a", modifier_keys=["control"])
    agent.keyboard("c", modifier_keys=["control"])
    
    # Switch application and paste
    agent.keyboard("tab", modifier_keys=["alt"])
    agent.click("input field")
    agent.keyboard("v", modifier_keys=["control"])

Information Extraction

Query Specificity

  • Be specific in your questions to get accurate results
  • Use structured schemas for complex data extraction
  • Validate extracted data before using it
from askui import ResponseSchemaBase

class ProductInfo(ResponseSchemaBase):
    name: str
    price: float
    in_stock: bool

# Specific query with schema
product = agent.get(
    "What is the product name, price, and stock status?",
    response_schema=ProductInfo
)

Performance Considerations

  • Cache extracted information when used multiple times
  • Batch queries when extracting multiple data points
  • Use appropriate timeouts for extraction operations

Timing and Synchronization

Wait Strategies

  1. Implicit waits: Built into AskUI operations
  2. Explicit waits: Use when needed for dynamic content
  3. Smart waits: Wait for specific conditions
# Wait for element to appear
for _ in range(10):
    try:
        agent.click("Dynamic Button")
        break
    except:
        agent.wait(1)

Performance Optimization

  • Minimize unnecessary waits
  • Use element detection to verify readiness
  • Batch operations when possible
  • Monitor execution time for optimization opportunities

Data Extraction Performance

When extracting data from UI elements, consider these optimization strategies:

Batch Extraction

Instead of multiple calls:

# Inefficient
name = agent.get("What is the username?", response_schema=str)
email = agent.get("What is the email?", response_schema=str)
role = agent.get("What is the role?", response_schema=str)

# Efficient - single extraction
class UserInfo(ResponseSchemaBase):
    name: str
    email: str
    role: str

user = agent.get("Extract user information", response_schema=UserInfo)
Caching Results
class DataCache:
    def __init__(self):
        self._cache = {}
    
    def get_or_extract(self, agent, key, question, schema):
        if key not in self._cache:
            self._cache[key] = agent.get(question, response_schema=schema)
        return self._cache[key]

cache = DataCache()
# Reuse expensive extractions
table_data = cache.get_or_extract(
    agent, 
    'main_table',
    "Extract the main data table",
    List[dict]
)

Cross-Platform Considerations

Platform-Specific Handling

import platform

with VisionAgent() as agent:
    if platform.system() == "Darwin":  # macOS
        agent.keyboard("a", modifier_keys=["command"])
    else:  # Windows/Linux
        agent.keyboard("a", modifier_keys=["control"])

Resolution Independence

  • Use relative positioning instead of absolute coordinates
  • Test across different screen resolutions
  • Account for scaling factors on high-DPI displays

Debugging and Troubleshooting

Logging Best Practices

with VisionAgent(log_level="DEBUG") as agent:
    # Detailed logging for troubleshooting
    agent.click("problematic button")

Visual Debugging

  • Take screenshots before and after critical actions
  • Use annotation features to verify element detection
  • Enable visual feedback during development

Summary

Following these best practices ensures:

  1. Reliability: Actions work consistently across environments
  2. Maintainability: Code is easy to understand and modify
  3. Performance: Operations execute efficiently
  4. Robustness: Automation handles edge cases gracefully

For practical implementation details, see the how-to guide on interacting with elements.