Frequently Asked Questions (FAQ)
1. General Questions
What is AskUI?
AskUI is an AI-powered UI automation tool that interacts with applications visually - like a human would. It enables automation across any platform, UI, or application without relying on code-level selectors or APIs.
What can I automate with AskUI?
You can automate:
- Web, desktop, and hybrid applications
- UI testing and QA workflows
- Test native apps with hardware integration
- Document-based processes
- Repetitive operational tasks
Do I need to write code?
AskUI provides a developer-friendly TypeScript SDK and a Python Vision Agent, and we’re working on more intuitive, low-code/no-code options for non-dev users.
How does AskUI detect UI elements?
AskUI uses computer vision and OCR to recognize elements visually based on:
- Text labels
- Icons or logos
- Relative layout
- Element shapes and positions
This makes it highly flexible - even across dynamic or custom UIs. See more in the core concept.
Can I use AskUI for automated testing?
Yes! AskUI is great for:
- UI regression testing
- End-to-end test cases
- Visual verification
- Cross-platform QA (including Android)
What platforms are supported?
- OS: Windows, macOS, Linux, Android, iOS (cominig soon)
- Apps: Any web or desktop app
- Mobile: Android (via emulator/screen mirroring)
It´s framework independent: You can use Electron, JavaFX, Java Swing, .NET apps, and more
Where are the examples and templates?
You can find:
2. Model Questions
What models is AskUI using?
AskUI uses a layered system of AI models, each with a distinct role in understanding, executing, and interacting with user interfaces. Here’s how we classify and use them:
-
Grounding Models (Locators)
- Grounding models identify and interact with UI elements on the screen.
-
Query Models (Asks)
- Responsible for answering user queries or generating intelligent responses.
-
Large Action Models (act command) (Multi Step)
- Responsibilites
- Goal to → Planning
- Delegate Grounding Models
- Delegate Query Models
- Reflection of Errors
- UI-Tars
- Computer-Use
Model Type Model Name Purpose Teachable Online Trainable Grounding UIDT-1 Locate elements & understand screen No Partial Grounding PTA-1 Convert prompts into one-click actions No Yes Query GPT-4 Understand & respond to user queries Yes No Query Computer Use Understand & respond to user queries Yes No Large Action (act) Computer Use Plan and execute full workflows Yes No Large Action (act) UI-Tars Plan and execute full workflows Yes No - Responsibilites
What exactly is the AskUI UIDT-1 Model?
A powerful model composed of multiple specialized sub-models:
-
Element Detector
Trained to detect 9 key UI element types (like buttons, text fields, checkboxes, etc.).
-
End-to-End OCR (Optical Character Recognition)
Used to read and understand text in the UI:
- Text Recognition: A teachable model that learns from user corrections.
- Text Detection: Locates where text appears on the screen.
-
DSL (Domain-Specific Language)
Allows precise descriptions of UI actions.
How does the Prompt-to-Action (PTA-1) work? – Single-Step Execution
- Converts natural language prompts into direct UI actions.
- Built as a teachable model to continuously improve from user feedback.
What is a Large Action Models (LAM) – Multi-Step Execution
These are higher-level models responsible for planning and executing more complex workflows.
Responsibilities:
- Planning: Understand a user goal and break it into steps.
- Delegation: Assign tasks to Grounding or Query models.
- Reflection: Analyze and correct errors during execution.
Includes:
- UI-Tars: Task agents specialized in certain UI flows.
- Computer-Use: Models that simulate a real user interacting with a full application or system.
What is the difference between Teaching vs. Training?
Teaching
Teaching is about helping the model improve or adapt without changing its underlying neural network weights.
Instead, you guide it using examples, rules, or context. It’s often about using prompts, memory, or interaction history to adjust model behavior in a targeted and efficient way.
Teaching in AskUI includes:
- Prompt Engineering: Giving clear, contextual instructions like “Click on the login button” helps models interpret intent more accurately.
- LLMs and LAMs are teacheable models, like GPT4, Anthropic Claude etc
Training
Training involves changing the internal parameters of a model - its weights - through exposure to large datasets and feedback signals.
This is computationally expensive and typically happens during development, online training or in batch processes. OCR re-teaching is on example of training and AskUI supports online training.
Online-Training in AskUI includes:
-
Our OCR engine is a composite of teachable and trainable models. While you can “teach” it by correcting recognized text, deeper improvements (like recognizing new font types or edge cases) require training on more labeled images.
-
Prompt-to-Action (e.g., PTA-1):
These models incorporate feedback from user corrections. For example, if a button click fails, and the user clarifies the intended target, the model updates its interpretation next time.
Why Use Both ?
AskUI combines training and teaching because:
- Teaching is agile: It empowers users to refine behavior instantly.
- Training is foundational: It builds the core capabilities of the model.
Aspect | Teaching | Training |
---|---|---|
Speed | Immediate or near real-time | Takes time (minutes to hours) |
Who does it? | Users or automation | Developers/Engineers |
Affects Model Weights | ❌ No | ✅ Yes |
Flexibility | High - works with new scenarios | Medium - needs structured data |
Example in AskUI | PTA-1 prompt learning, OCR tweaks | UIDT-1 model expansion |
3. Set-Up Questions
How do I install AskUI?
Install AskUI Agent OS
Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system.
Typescript: Install the CLI with:
Python: Install with:
Then follow our Getting Started Guide to set up your workspace.\
4. Common Errors & Troubleshooting
5. Pricing Questions
What is an Ask?
An ask() refers to one action or inference call that AskUI performs. This involves taking a screenshot to localize an element on the screen, which is a key part of interacting with UI elements visually.
For example, if you have a test case where a user browses through a menu, like this:
click('nav bar')
click('home')
ask('is the home site visible?')
In this case, there are 3 asks or inference calls:
- The first two
click()
commands do count as asks, as they require screenshots to localize the elements (even if it’s for interaction purposes). - The third action,
ask('is the home site visible?')
, also counts as one ask/inference call, since it involves taking a screenshot to verify if the element is visible.
However, simple mouse actions such as scroll()
, leftclick()
, or other actions that do not require localization or element recognition (i.e., no screenshot taken) do not count as asks.
How is Pricing for Ask Commands Calculated?
Each ask() call (which involves one screenshot) will be counted toward your usage, and the pricing is based on the total number of asks or inference calls made during the automation process.
How is the Act Command Priced?
The act() command is a more advanced action that involves multi-step workflows and typically requires interaction with external models such as Anthropic’s Claude Sonnet (or other models like OpenAI, etc.) for task execution. As such, act() is priced separately.
The pricing for act() commands is determined by the license agreement for the external model being used. For example, if you’re using Claude Sonnet by Anthropic, charges will be based on the number of calls made to the model during task execution. The pricing model can vary depending on the specific external model (e.g., OpenAI, Anthropic, or other AI providers).
6. Enterprise Questions
Is AskUI secure?
Yes. AskUI is built with security as a core principle. All automation processes run within your own infrastructure. Screenshots and input data remain local unless explicitly configured to be shared via cloud-based features.
What are all the domains used by AskUI so I can whitelist them?
hub.askui.com
prod.askui.com
inference.askui.com
files.askui.com
Is AskUI hosting within the European Union?
Yes, all our hosting infrastructure is located within the European Union to ensure compliance with EU regulations. Also all data are stored within the European Union
Are you GDPR compliant?
Yes. AskUI complies with the General Data Protection Regulation (GDPR). If you utilize our native inference capabilities, all data processing aligns with GDPR standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).
Are you ISO-27001 compliant?
Yes, AskUI is ISO 27001 Compliant. Our Certificate is available under trust.askui.com. If you utilize our native inference capabilities, all data processing aligns with the standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).
Where can I find all the compliance documents?
You can find this here in our Trust Center at trust.askui.com.
Do you provide on-premise inference?
Yes, please contact our team for detailed information.
Was this page helpful?