This guide helps you resolve issues with element detection, OCR accuracy, and text recognition problems.

Element Not Found

When AskUI can’t find an element you’re trying to interact with:

Quick Diagnostics

  1. Capture what the agent sees:
agent.save_screenshot("debug_screenshot.png")
  1. Check element visibility:
# Verify element is on screen
is_visible = agent.get("Is the submit button visible?", response_schema=bool)

Solutions

Use More Specific Descriptions

# Too vague
agent.click("button")

# More specific
agent.click("blue Submit button at bottom of form")

Use Relative Positioning

from askui import locators as loc

# Find element relative to others
agent.click(
    loc.Element().right_of(loc.Text("Username"))
)

Check Multi-Monitor Setup

# Set active display for multi-monitor systems
agent.tools.os.set_display(1)  # Primary display
agent.click("Submit")

agent.tools.os.set_display(2)  # Secondary display
agent.click("Submit")

Wrong Element Selected

When AskUI selects the wrong element from multiple similar ones:

Solutions

Add Context with Relative Locators

# Multiple "Edit" buttons? Be specific
agent.click(
    loc.Text("Edit").right_of(loc.Text("John Doe"))
)

Use Index for Specific Instances

# Select the third button
agent.click(loc.Element("button").at_index(2))

# Or select first/last
agent.click(loc.Text("Delete").first())
agent.click(loc.Text("Delete").last())

Combine Multiple Locators

# Very specific selection
agent.click(
    loc.Element("button")
        .with_text("Submit")
        .below_of(loc.Text("Terms"))
        .inside_of(loc.Element("form"))
)

OCR and Text Recognition Issues

Misspellings and Character Confusion

Problem: The OCR model sometimes misreads characters, especially in certain fonts or noisy images. This can result in words being misclassified or misspelled, which then causes the automation to fail when it searches for exact matches.

✅ Expected Behavior

Text is correctly spelled:

✅ Hallo ✅

👍 Works with click().text("Hallo")

❌ Actual Issue

Text is misspelled

HaII0

👎 Can’t find click().text("Hallo"). Because of recognition issues. (lI and o0)

Solutions:

Text Merging Issues

Problem: Sometimes, Text Detector/annotation tool merges an icon and texts into one, even though they look separate on screen.

Example: Say you want to click just the name “Alice Johnson” field or just the position field in an interface - but OCR detects them as one long string:

✅ Expected Behavior

🖼️ Icon and Text are detected separately:

🧑 ✅ Name ✅ 🤖 ✅ Role ✅

👍 Works with click().text("Name") or click().text("Role")

❌ Actual Issue

🖼️ Icon and text are detected together:

🧑 Name🤖 Role

👎 Can’t find click().text("Name").

Solutions:

Merged Texts

Problem: Sometimes, Text Detector/annotation tool merges two separate texts into one, even though they look clearly split on screen.

Example: Say you want to click just the name “Alice Johnson” field or just the position field in an interface - but OCR detects them as one long string:

✅ Expected Behavior

🖼️ Text fields detected separately:

Alice JohnsonSoftware Engineer

👍 Works with text("Alice Johnson") or text("Software Engineer")

❌ Actual Issue

🖼️ Texts merged into one block:

Alice Johnson Software Engineer

👎 Can’t find either one on its own.

Solutions:

Text Separation

Problem: Sometimes, Text Detector/annotation tool separates a text into two texts, even though they look clearly merged on screen.

Example: Say you want to click “Alice Johnson” as one field - but OCR detects them as two separate words:

✅ Expected Behavior

🖼️ Words are detected as one sentence:

Alice Johnson

👍 Works with text("Alice Johnson")

❌ Actual Issue

🖼️ Words are detected as separated texts:

AliceJohnson

👎 Can’t find text("Alice Johnson") as one.

Solutions:

Vertical Text Merging

Problem: Sometimes, Text Detector/annotation tool merges two lines to one text, even though they look clearly as two lines on screen.

✅ Expected Behavior

🖼️ Texts are detected as two lines:

Alice Johnson

👍 Works with text("Alice Johnson")

❌ Actual Issue

🖼️ Texts are detected as one text:

<no words recognized>

👎 Can’t find text("Alice Johnson") on its own.

Solutions:

Single Character Not Detected

Problem: Sometimes, Text Detector/annotation tool does not detect single characters, even though they are clearly visible on screen.

Example: Say you want to click just the character “2” - but OCR does not detect it:

✅ Expected Behavior

🖼️ Single chars are detected:

123

👍 Works with text("2")

❌ Actual Issue

🖼️ Char 2 is not detected:

123

👎 Can’t find text("2").

Solution:

Text Not Detected

Problem: Sometimes, for no apparent reason, Text Detector/annotation tool does not detect a text, even though you can see it clearly on screen.

Example: Say you want to click just the name “Alice Johnson” field - but OCR does not detect the text at all:

✅ Expected Behavior

🖼️ Text was detected:

Alice Johnson

👍 Works with text("Alice Johnson")

❌ Actual Issue

🖼️ Text wasn’t detected

Alice Johnson

👎 Can’t find text("Alice Johnson").

Common Causes:

  • Low contrast text
  • Decorative fonts
  • Text on complex backgrounds
  • Very small or very large text
  • Rendering issues or timing

Solutions:

Windows-Specific Issues

ButtonEvent Access Denied

Error: ButtonEvent down failed: Access is denied

This occurs when:

  • Windows Lock Screen is active
  • RDP session is minimized

Solutions:

  1. For Lock Screen: Ensure system is unlocked before running automation

  2. For RDP: Keep session active by setting this registry key on the client machine:

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Terminal Server Client
DWORD RemoteDesktop_SuppressWhenMinimized = 2

Performance Optimization

Slow Element Detection

Solutions:

  1. Cache locators:
# Reuse locator objects
submit_btn = loc.Element("button").with_text("Submit")
agent.click(submit_btn)
# ... later
agent.click(submit_btn)  # Reuses cached locator
  1. Use specific locators:
# Slower: Natural language
agent.click("the submit button")

# Faster: Specific locator
agent.click(loc.Element("button").with_text("Submit"))
  1. Reduce search scope:
# Search within specific area
form_area = loc.Element("form")
agent.click(loc.Text("Submit").inside_of(form_area))

Debugging Tips

Enable Verbose Logging

import logging
logging.basicConfig(level=logging.DEBUG)

Visual Debugging

# Save screenshots at key points
agent.save_screenshot("before_click.png")
agent.click("Submit")
agent.save_screenshot("after_click.png")

Interactive Debugging

# Pause to inspect state
input("Press Enter to continue after checking UI...")

Common Patterns

Retry Logic

def click_with_retry(agent, locator, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            agent.click(locator)
            return True
        except Exception as e:
            if attempt == max_attempts - 1:
                raise
            agent.wait(1)

Wait for Element

import time

def wait_for_element(agent, text, timeout=10):
    start = time.time()
    while time.time() - start < timeout:
        if agent.get(f"Is '{text}' visible?", response_schema=bool):
            return True
        agent.wait(0.5)
    return False

Next Steps