Frequently Asked Questions (FAQ)
1. General Questions
What is AskUI?
AskUI is an AI-powered UI automation tool that interacts with applications visually - like a human would. It enables automation across any platform, UI, or application without relying on code-level selectors or APIs.
What can I automate with AskUI?
You can automate:
- Web, desktop, and hybrid applications
- UI testing and QA workflows
- Test native apps with hardware integration
- Document-based processes
- Repetitive operational tasks
Do I need to write code?
AskUI provides a developer-friendly TypeScript SDK and a Python Vision Agent, and we’re working on more intuitive, low-code/no-code options for non-dev users.
How does AskUI detect UI elements?
AskUI uses computer vision and OCR to recognize elements visually based on:
- Text labels
- Icons or logos
- Relative layout
- Element shapes and positions
This makes it highly flexible - even across dynamic or custom UIs. See more in the core concept.
Can I use AskUI for automated testing?
Yes! AskUI is great for:
- UI regression testing
- End-to-end test cases
- Visual verification
- Cross-platform QA (including Android)
What platforms are supported?
- OS: Windows, macOS, Linux, Android, iOS (cominig soon)
- Apps: Any web or desktop app
- Mobile: Android (via emulator/screen mirroring)
It´s framework independent: You can use Electron, JavaFX, Java Swing, .NET apps, and more
Where are the examples and templates?
You can find:
2. Model Questions
What models is AskUI using?
AskUI uses a layered system of AI models, each with a distinct role in understanding, executing, and interacting with user interfaces. Here’s how we classify and use them:
-
Grounding Models (Locators)
- Grounding models identify and interact with UI elements on the screen.
-
Query Models (Asks)
- Responsible for answering user queries or generating intelligent responses.
-
Large Action Models (act command) (Multi Step)
- Responsibilites
- Goal to → Planning
- Delegate Grounding Models
- Delegate Query Models
- Reflection of Errors
- UI-Tars
- Computer-Use
Model Type Model Name Purpose Teachable Online Trainable Grounding UIDT-1 Locate elements & understand screen No Partial Grounding PTA-1 Convert prompts into one-click actions No Yes Query GPT-4 Understand & respond to user queries Yes No Query Computer Use Understand & respond to user queries Yes No Large Action (act) Computer Use Plan and execute full workflows Yes No Large Action (act) UI-Tars Plan and execute full workflows Yes No - Responsibilites
What exactly is the AskUI UIDT-1 Model?
A powerful model composed of multiple specialized sub-models:
-
Element Detector
Trained to detect 9 key UI element types (like buttons, text fields, checkboxes, etc.).
-
End-to-End OCR (Optical Character Recognition)
Used to read and understand text in the UI:
- Text Recognition: A teachable model that learns from user corrections.
- Text Detection: Locates where text appears on the screen.
-
DSL (Domain-Specific Language)
Allows precise descriptions of UI actions.
How does the Prompt-to-Action (PTA-1) work? – Single-Step Execution
- Converts natural language prompts into direct UI actions.
- Built as a teachable model to continuously improve from user feedback.
What is a Large Action Models (LAM) – Multi-Step Execution
These are higher-level models responsible for planning and executing more complex workflows.
Responsibilities:
- Planning: Understand a user goal and break it into steps.
- Delegation: Assign tasks to Grounding or Query models.
- Reflection: Analyze and correct errors during execution.
Includes:
- UI-Tars: Task agents specialized in certain UI flows.
- Computer-Use: Models that simulate a real user interacting with a full application or system.
What is the difference between Teaching vs. Training?
Teaching
Teaching is about helping the model improve or adapt without changing its underlying neural network weights.
Instead, you guide it using examples, rules, or context. It’s often about using prompts, memory, or interaction history to adjust model behavior in a targeted and efficient way.
Teaching in AskUI includes:
- Prompt Engineering: Giving clear, contextual instructions like “Click on the login button” helps models interpret intent more accurately.
- LLMs and LAMs are teacheable models, like GPT4, Anthropic Claude etc
Training
Training involves changing the internal parameters of a model - its weights - through exposure to large datasets and feedback signals.
This is computationally expensive and typically happens during development, online training or in batch processes. OCR re-teaching is on example of training and AskUI supports online training.
Online-Training in AskUI includes:
-
Our OCR engine is a composite of teachable and trainable models. While you can “teach” it by correcting recognized text, deeper improvements (like recognizing new font types or edge cases) require training on more labeled images.
-
Prompt-to-Action (e.g., PTA-1):
These models incorporate feedback from user corrections. For example, if a button click fails, and the user clarifies the intended target, the model updates its interpretation next time.
Why Use Both ?
AskUI combines training and teaching because:
- Teaching is agile: It empowers users to refine behavior instantly.
- Training is foundational: It builds the core capabilities of the model.
Aspect | Teaching | Training |
---|---|---|
Speed | Immediate or near real-time | Takes time (minutes to hours) |
Who does it? | Users or automation | Developers/Engineers |
Affects Model Weights | ❌ No | ✅ Yes |
Flexibility | High - works with new scenarios | Medium - needs structured data |
Example in AskUI | PTA-1 prompt learning, OCR tweaks | UIDT-1 model expansion |
3. Set-Up Questions
How do I install AskUI?
Install AskUI Agent OS
Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system.
Typescript: Install the CLI with:
Python: Install with:
Then follow our Getting Started Guide to set up your workspace.\
4. Common Errors & Troubleshooting
Python Vision Agent
How can I debug a “Session info doesn’t match”-Error?
Is you are receiving an “Session info doesn’t match”-Error.
Steps to unblock:
- Open Terminal
- Enter
askui-shell
- Import Debug Commands by
AskUI-ImportDebugCommands
- Stop all running AskUI Controller
AskUI-StopProcess -All
- Try again
ADE / askui-shell
How can I use dotenv instead of askui-shell for Access Token?
If you prefer not to use the AskUI Shell, you can expose your ASKUI_WORKSPACE_ID and ASKUI_TOKEN via a .env file using python-dotenv.
🛠 Setup Steps
- Install the package:
- Create a .env file in your project root:
Need a token? See: Create Access Tokens
- Load in Python:
Agent OS
Send Error Report to AskUI
This steps are describing how you can send an Error Report to AskUI.
- Start Terminal/Powershell/Console
- Enter
askui-shell
- Enter
AskUI-ImportExperimentalCommands
- Enter
AskUI-NewErrorReport
(See Logs below)- Press
n
forDo you want to open the error report directory?
- Press
y
forDo you approve the error report content?
- Press
- Send the
ErrorReport-<uui>-<date>.zip
from lineInfo: Submitting error report archive '...
to info@askui.com or an AskUI employee.
AskUI Shell
How can I set askui-shell
as a custom terminal in VSCode?
To configure VS Code to use a custom terminal like askui-shell
, you can set the Shell path in the terminal settings. Here’s how to do it:
🔧 Steps to Set a Custom Terminal in Vscode
- Open VS Code.
- Go to File > Preferences > Settings (or press
Ctrl+,
). - Click the Open Settings (JSON) icon in the top right corner.
- Paste the JSON below into your
settings.json
file.
- Save the file
How can I set askui-shell
as a custom terminal in Pycharm?
To configure PyCharm to use a custom terminal like askui-shell
, you can set the Shell path in the terminal settings. Here’s how to do it:
🔧 Steps to Set a Custom Terminal in PyCharm
- Open Settings:
- Go to File > Settings (or PyCharm > Preferences on macOS).
- Navigate to Terminal Settings:
- Go to Tools > Terminal.
- Set the Shell Path:
-
In the Shell path field, enter the path to your
askui-shell
script. -
Example for Windows:
-
Example for macOS/Linux:
Or if using bash:
-
- Apply and Restart Terminal:
- Click Apply and OK.
- Open a new terminal tab in PyCharm to see the changes.
5. Pricing Questions
What is an Ask?
An ask() refers to one action or inference call that AskUI performs. This involves taking a screenshot to localize an element on the screen, which is a key part of interacting with UI elements visually.
For example, if you have a test case where a user browses through a menu, like this:
click('nav bar')
click('home')
ask('is the home site visible?')
In this case, there are 3 asks or inference calls:
- The first two
click()
commands do count as asks, as they require screenshots to localize the elements (even if it’s for interaction purposes). - The third action,
ask('is the home site visible?')
, also counts as one ask/inference call, since it involves taking a screenshot to verify if the element is visible.
However, simple mouse actions such as scroll()
, leftclick()
, or other actions that do not require localization or element recognition (i.e., no screenshot taken) do not count as asks.
How is Pricing for Ask Commands Calculated?
Each ask() call (which involves one screenshot) will be counted toward your usage, and the pricing is based on the total number of asks or inference calls made during the automation process.
How is the Act Command Priced?
The act() command is a more advanced action that involves multi-step workflows and typically requires interaction with external models such as Anthropic’s Claude Sonnet (or other models like OpenAI, etc.) for task execution. As such, act() is priced separately.
The pricing for act() commands is determined by the license agreement for the external model being used. For example, if you’re using Claude Sonnet by Anthropic, charges will be based on the number of calls made to the model during task execution. The pricing model can vary depending on the specific external model (e.g., OpenAI, Anthropic, or other AI providers).
6. Enterprise Questions
Is AskUI secure?
Yes. AskUI is designed with a strong focus on security, privacy, and compliance. All automation is executed locally within your own infrastructure, ensuring full control over your environment. Screenshots and UI instructions are processed securely in the cloud by our AI model, but are never stored—all processing is transient and limited strictly to the duration of the request.
Our cloud infrastructure is hosted within the European Union, supporting compliance with EU data protection laws. AskUI is ISO 27001 certified, fully GDPR compliant, and regularly undergoes independent penetration testing to ensure a secure and robust platform.
What are all the domains used by AskUI so I can whitelist them?
- hub.askui.com
- prod.askui.com
- inference.askui.com
- files.askui.com
- tracking-cdn.askui.com
- tracking.askui.com
Is AskUI hosting within the European Union?
Yes, all our hosting infrastructure is located within the European Union to ensure compliance with EU regulations. Also all data are stored within the European Union
Are you GDPR compliant?
Yes. AskUI complies with the General Data Protection Regulation (GDPR). If you utilize our native inference capabilities, all data processing aligns with GDPR standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).
Are you ISO-27001 compliant?
Yes, AskUI is ISO 27001 Compliant. Our Certificate is available under trust.askui.com. If you utilize our native inference capabilities, all data processing aligns with the standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).
Where can I find all the compliance documents?
You can find this here in our Trust Center at trust.askui.com.
Do you provide on-premise inference?
Yes, please contact our team for detailed information.