Extract text content from your UI using string response schemas. This is useful for reading labels, messages, form values, and any textual content.

Basic Usage

from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.example.com")
    agent.wait(3)
    
    text = agent.get("What is the main heading?", response_schema=str)
    print(f"main heading: {text}")

Best Practices

  1. Be Specific About Location: Mention where the text is located

    # Good - specific location
    header = agent.get("What is the text in the page header?", response_schema=str)
    
    # Less specific
    text = agent.get("What text is shown?", response_schema=str)
    
  2. Handle Empty or Missing Text: Consider that text might not exist

    from typing import Optional
    
    # Text might not be present
    optional_text = agent.get("What is the subtitle, if any?", response_schema=Optional[str])
    
    if optional_text:
        print(f"Subtitle: {optional_text}")
    
  3. Clean and Validate Extracted Text: Post-process extracted text as needed

    # Extract and clean price
    price_text = agent.get("What is the price?", response_schema=str)
    price_value = float(price_text.replace("$", "").replace(",", ""))