1. General Questions

What is AskUI?

AskUI is an AI-powered UI automation tool that interacts with applications visually - like a human would. It enables automation across any platform, UI, or application without relying on code-level selectors or APIs.


What can I automate with AskUI?

You can automate:

  • Web, desktop, and hybrid applications
  • UI testing and QA workflows
  • Test native apps with hardware integration
  • Document-based processes
  • Repetitive operational tasks

Do I need to write code?

AskUI provides a developer-friendly TypeScript SDK and a Python Vision Agent, and we’re working on more intuitive, low-code/no-code options for non-dev users.


How does AskUI detect UI elements?

AskUI uses computer vision and OCR to recognize elements visually based on:

  • Text labels
  • Icons or logos
  • Relative layout
  • Element shapes and positions

This makes it highly flexible - even across dynamic or custom UIs. See more in the core concept.


Can I use AskUI for automated testing?

Yes! AskUI is great for:

  • UI regression testing
  • End-to-end test cases
  • Visual verification
  • Cross-platform QA (including Android)

What platforms are supported?

  • OS: Windows, macOS, Linux, Android, iOS (cominig soon)
  • Apps: Any web or desktop app
  • Mobile: Android (via emulator/screen mirroring)

It´s framework independent: You can use Electron, JavaFX, Java Swing, .NET apps, and more

Where are the examples and templates?

You can find:

2. Model Questions

What models is AskUI using?


AskUI uses a layered system of AI models, each with a distinct role in understanding, executing, and interacting with user interfaces. Here’s how we classify and use them:

  1. Grounding Models (Locators)

    1. Grounding models identify and interact with UI elements on the screen.
  2. Query Models (Asks)

    • Responsible for answering user queries or generating intelligent responses.
  3. Large Action Models (act command) (Multi Step)

    • Responsibilites
      • Goal to → Planning
      • Delegate Grounding Models
      • Delegate Query Models
      • Reflection of Errors
    • UI-Tars
    • Computer-Use
    Model TypeModel NamePurposeTeachableOnline Trainable
    GroundingUIDT-1Locate elements & understand screenNoPartial
    GroundingPTA-1Convert prompts into one-click actionsNoYes
    QueryGPT-4Understand & respond to user queriesYesNo
    QueryComputer UseUnderstand & respond to user queriesYesNo
    Large Action (act)Computer UsePlan and execute full workflowsYesNo
    Large Action (act)UI-TarsPlan and execute full workflowsYesNo

What exactly is the AskUI UIDT-1 Model?

A powerful model composed of multiple specialized sub-models:

  • Element Detector

    Trained to detect 9 key UI element types (like buttons, text fields, checkboxes, etc.).

  • End-to-End OCR (Optical Character Recognition)

    Used to read and understand text in the UI:

    • Text Recognition: A teachable model that learns from user corrections.
    • Text Detection: Locates where text appears on the screen.
  • DSL (Domain-Specific Language)

    Allows precise descriptions of UI actions.

How does the Prompt-to-Action (PTA-1) work? – Single-Step Execution

  • Converts natural language prompts into direct UI actions.
  • Built as a teachable model to continuously improve from user feedback.

What is a Large Action Models (LAM)Multi-Step Execution

These are higher-level models responsible for planning and executing more complex workflows.

Responsibilities:

  • Planning: Understand a user goal and break it into steps.
  • Delegation: Assign tasks to Grounding or Query models.
  • Reflection: Analyze and correct errors during execution.

Includes:

  • UI-Tars: Task agents specialized in certain UI flows.
  • Computer-Use: Models that simulate a real user interacting with a full application or system.

What is the difference between Teaching vs. Training?

Teaching

Teaching is about helping the model improve or adapt without changing its underlying neural network weights.

Instead, you guide it using examples, rules, or context. It’s often about using prompts, memory, or interaction history to adjust model behavior in a targeted and efficient way.

Teaching in AskUI includes:

  • Prompt Engineering: Giving clear, contextual instructions like “Click on the login button” helps models interpret intent more accurately.
  • LLMs and LAMs are teacheable models, like GPT4, Anthropic Claude etc

Training

Training involves changing the internal parameters of a model - its weights - through exposure to large datasets and feedback signals.

This is computationally expensive and typically happens during development, online training or in batch processes. OCR re-teaching is on example of training and AskUI supports online training.

Online-Training in AskUI includes:

  • OCR Re-Training (UIDT-1):

    Our OCR engine is a composite of teachable and trainable models. While you can “teach” it by correcting recognized text, deeper improvements (like recognizing new font types or edge cases) require training on more labeled images.

  • Prompt-to-Action (e.g., PTA-1):

    These models incorporate feedback from user corrections. For example, if a button click fails, and the user clarifies the intended target, the model updates its interpretation next time.


Why Use Both ?

AskUI combines training and teaching because:

  • Teaching is agile: It empowers users to refine behavior instantly.
  • Training is foundational: It builds the core capabilities of the model.
AspectTeachingTraining
SpeedImmediate or near real-timeTakes time (minutes to hours)
Who does it?Users or automationDevelopers/Engineers
Affects Model Weights❌ No✅ Yes
FlexibilityHigh - works with new scenariosMedium - needs structured data
Example in AskUIPTA-1 prompt learning, OCR tweaksUIDT-1 model expansion

3. Set-Up Questions

How do I install AskUI?

Install AskUI Agent OS

Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system.

Typescript: Install the CLI with:

npm install -g @askui/cli

Python: Install with:

pip install askui

Then follow our Getting Started Guide to set up your workspace.\

4. Common Errors & Troubleshooting


5. Pricing Questions

What is an Ask?

An ask() refers to one action or inference call that AskUI performs. This involves taking a screenshot to localize an element on the screen, which is a key part of interacting with UI elements visually.

For example, if you have a test case where a user browses through a menu, like this:

  1. click('nav bar')
  2. click('home')
  3. ask('is the home site visible?')

In this case, there are 3 asks or inference calls:

  • The first two click() commands do count as asks, as they require screenshots to localize the elements (even if it’s for interaction purposes).
  • The third action, ask('is the home site visible?'), also counts as one ask/inference call, since it involves taking a screenshot to verify if the element is visible.

However, simple mouse actions such as scroll(), leftclick(), or other actions that do not require localization or element recognition (i.e., no screenshot taken) do not count as asks.

How is Pricing for Ask Commands Calculated?


Each ask() call (which involves one screenshot) will be counted toward your usage, and the pricing is based on the total number of asks or inference calls made during the automation process.

How is the Act Command Priced?

The act() command is a more advanced action that involves multi-step workflows and typically requires interaction with external models such as Anthropic’s Claude Sonnet (or other models like OpenAI, etc.) for task execution. As such, act() is priced separately.
The pricing for act() commands is determined by the license agreement for the external model being used. For example, if you’re using Claude Sonnet by Anthropic, charges will be based on the number of calls made to the model during task execution. The pricing model can vary depending on the specific external model (e.g., OpenAI, Anthropic, or other AI providers).

6. Enterprise Questions

Is AskUI secure?

Yes. AskUI is built with security as a core principle. All automation processes run within your own infrastructure. Screenshots and input data remain local unless explicitly configured to be shared via cloud-based features.

What are all the domains used by AskUI so I can whitelist them?

hub.askui.com
prod.askui.com
inference.askui.com
files.askui.com

Is AskUI hosting within the European Union?

Yes, all our hosting infrastructure is located within the European Union to ensure compliance with EU regulations. Also all data are stored within the European Union

Are you GDPR compliant?

Yes. AskUI complies with the General Data Protection Regulation (GDPR). If you utilize our native inference capabilities, all data processing aligns with GDPR standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).

Are you ISO-27001 compliant?

Yes, AskUI is ISO 27001 Compliant. Our Certificate is available under trust.askui.com. If you utilize our native inference capabilities, all data processing aligns with the standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).

Where can I find all the compliance documents?

You can find this here in our Trust Center at trust.askui.com.

Do you provide on-premise inference?

Yes, please contact our team for detailed information.