Extract text content from your UI using string response schemas. This is useful for reading labels, messages, form values, and any textual content.
Basic Usage
from askui import VisionAgent
with VisionAgent() as agent:
agent.tools.webbrowser.open_new("http://www.example.com")
agent.wait(3)
text = agent.get("What is the main heading?", response_schema=str)
print(f"main heading: {text}")
Best Practices
-
Be Specific About Location: Mention where the text is located
# Good - specific location
header = agent.get("What is the text in the page header?", response_schema=str)
# Less specific
text = agent.get("What text is shown?", response_schema=str)
-
Handle Empty or Missing Text: Consider that text might not exist
from typing import Optional
# Text might not be present
optional_text = agent.get("What is the subtitle, if any?", response_schema=Optional[str])
if optional_text:
print(f"Subtitle: {optional_text}")
-
Clean and Validate Extracted Text: Post-process extracted text as needed
# Extract and clean price
price_text = agent.get("What is the price?", response_schema=str)
price_value = float(price_text.replace("$", "").replace(",", ""))