1. General Questions

What is AskUI?

AskUI is an AI-powered UI automation tool that interacts with applications visually - like a human would. It enables automation across any platform, UI, or application without relying on code-level selectors or APIs.

What can I automate with AskUI?

You can automate:

Web, desktop, and hybrid applications
UI testing and QA workflows
Test native apps with hardware integration
Document-based processes
Repetitive operational tasks

Do I need to write code?

AskUI provides a developer-friendly TypeScript SDK and a Python Vision Agent, and we’re working on more intuitive, low-code/no-code options for non-dev users.

How does AskUI detect UI elements?

AskUI uses computer vision and OCR to recognize elements visually based on:

Text labels
Icons or logos
Relative layout
Element shapes and positions

This makes it highly flexible - even across dynamic or custom UIs. See more in the core concept.

Can I use AskUI for automated testing?

Yes! AskUI is great for:

UI regression testing
End-to-end test cases
Visual verification
Cross-platform QA (including Android)

What platforms are supported?

OS: Windows, macOS, Linux, Android, iOS (cominig soon)
Apps: Any web or desktop app
Mobile: Android (via emulator/screen mirroring)

It´s framework independent: You can use Electron, JavaFX, Java Swing, .NET apps, and more

Where are the examples and templates?

You can find:

2. Model Questions

What models is AskUI using?

AskUI uses a layered system of AI models, each with a distinct role in understanding, executing, and interacting with user interfaces. Here’s how we classify and use them:

Grounding Models (Locators)
1. Grounding models identify and interact with UI elements on the screen.
Query Models (Asks)
- Responsible for answering user queries or generating intelligent responses.

Large Action Models (act command) (Multi Step)

Responsibilites
- Goal to → Planning
- Delegate Grounding Models
- Delegate Query Models
- Reflection of Errors
UI-Tars
Computer-Use

Model Type	Model Name	Purpose	Teachable	Online Trainable
Grounding	UIDT-1	Locate elements & understand screen	No	Partial
Grounding	PTA-1	Convert prompts into one-click actions	No	Yes
Query	GPT-4	Understand & respond to user queries	Yes	No
Query	Computer Use	Understand & respond to user queries	Yes	No
Large Action (act)	Computer Use	Plan and execute full workflows	Yes	No
Large Action (act)	UI-Tars	Plan and execute full workflows	Yes	No

What exactly is the AskUI UIDT-1 Model?

A powerful model composed of multiple specialized sub-models:

Element Detector

Trained to detect 9 key UI element types (like buttons, text fields, checkboxes, etc.).
End-to-End OCR (Optical Character Recognition)

Used to read and understand text in the UI:
- Text Recognition: A teachable model that learns from user corrections.
- Text Detection: Locates where text appears on the screen.
DSL (Domain-Specific Language)

Allows precise descriptions of UI actions.

How does the Prompt-to-Action (PTA-1) work? – Single-Step Execution

Converts natural language prompts into direct UI actions.
Built as a teachable model to continuously improve from user feedback.

What is a Large Action Models (LAM) – Multi-Step Execution

These are higher-level models responsible for planning and executing more complex workflows.

Responsibilities:

Planning: Understand a user goal and break it into steps.
Delegation: Assign tasks to Grounding or Query models.
Reflection: Analyze and correct errors during execution.

Includes:

UI-Tars: Task agents specialized in certain UI flows.
Computer-Use: Models that simulate a real user interacting with a full application or system.

What is the difference between Teaching vs. Training?

Teaching

Teaching is about helping the model improve or adapt without changing its underlying neural network weights.

Instead, you guide it using examples, rules, or context. It’s often about using prompts, memory, or interaction history to adjust model behavior in a targeted and efficient way.

Teaching in AskUI includes:

Prompt Engineering: Giving clear, contextual instructions like “Click on the login button” helps models interpret intent more accurately.
LLMs and LAMs are teacheable models, like GPT4, Anthropic Claude etc

Training

Training involves changing the internal parameters of a model - its weights - through exposure to large datasets and feedback signals.

This is computationally expensive and typically happens during development, online training or in batch processes. OCR re-teaching is on example of training and AskUI supports online training.

Online-Training in AskUI includes:

OCR Re-Training (UIDT-1):

Our OCR engine is a composite of teachable and trainable models. While you can “teach” it by correcting recognized text, deeper improvements (like recognizing new font types or edge cases) require training on more labeled images.
Prompt-to-Action (e.g., PTA-1):

These models incorporate feedback from user corrections. For example, if a button click fails, and the user clarifies the intended target, the model updates its interpretation next time.

Why Use Both ?

AskUI combines training and teaching because:

Teaching is agile: It empowers users to refine behavior instantly.
Training is foundational: It builds the core capabilities of the model.

Aspect	Teaching	Training
Speed	Immediate or near real-time	Takes time (minutes to hours)
Who does it?	Users or automation	Developers/Engineers
Affects Model Weights	❌ No	✅ Yes
Flexibility	High - works with new scenarios	Medium - needs structured data
Example in AskUI	PTA-1 prompt learning, OCR tweaks	UIDT-1 model expansion

3. Set-Up Questions

How do I install AskUI?

Install AskUI Agent OS

Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system.

Windows

Linux

AMD64

curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run

bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run

ARM64

curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run

bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run

MacOS

curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run

bash /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run

Typescript: Install the CLI with:

npm install -g @askui/cli

Python: Install with:

pip install askui

Then follow our Getting Started Guide to set up your workspace.\

4. Common Errors & Troubleshooting

Python Vision Agent

How can I debug a “Session info doesn’t match”-Error?

Is you are receiving an “Session info doesn’t match”-Error.

Detailed Error Message

 File "C:\python_env\App\WinPython\python-3.12.4.amd64\Lib\site-packages\askui\tools\askui\askui_controller.py", line 391, in _stop_execution
 self._stub.StopExecution(
 File "C:\python_env\App\WinPython\python-3.12.4.amd64\Lib\site-packages\grpc_channel.py", line 1181, in call
 return _end_unary_response_blocking(state, call, False, None)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "C:\python_env\App\WinPython\python-3.12.4.amd64\Lib\site-packages\grpc_channel.py", line 1006, in _end_unary_response_blocking
 raise _InactiveRpcError(state) # pytype: disable=not-instantiable
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
 status = StatusCode.PERMISSION_DENIED
 details = "Session info doesn't match."
 debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Session info doesn't match.", grpc_status:7, created_time:"2025-05-28T08:10:05.9678846+00:00"}"

Steps to unblock:

Open Terminal
Enter askui-shell
Import Debug Commands by AskUI-ImportDebugCommands
Stop all running AskUI Controller AskUI-StopProcess -All
Try again

ADE / askui-shell

Mac Web Installer crashes with `Killed: 9`

How can I use dotenv instead of askui-shell for Access Token?

If you prefer not to use the AskUI Shell, you can expose your ASKUI_WORKSPACE_ID and ASKUI_TOKEN via a .env file using python-dotenv.

🛠 Setup Steps

Install the package:

pip install python-dotenv

Create a .env file in your project root:

ASKUI_WORKSPACE_ID=<your workspace id>
ASKUI_TOKEN=<your access token>

Need a token? See: Create Access Tokens

Load in Python:

from dotenv import load_dotenv

load_dotenv()

...
from askui import VisionAgent

with VisionAgent() as agent:
  ...

Agent OS

Send Error Report to AskUI

This steps are describing how you can send an Error Report to AskUI.

Start Terminal/Powershell/Console
Enter askui-shell
Enter AskUI-ImportExperimentalCommands
Enter AskUI-NewErrorReport (See Logs below)
1. Press n for Do you want to open the error report directory?
2. Press y for Do you approve the error report content?

(base) ADE C:\Users\AskUi\> AskUI-NewErrorReport
Info:  Generating a new error report...
Info:  Error description file created at 'C:\Users\AskUi\.askui\ErrorReports\64284117-40ba-4a09-b451-9c75c1754f74.prepared\ErrorDescription.md'. Please fill in the details.
Do you want to open the error report directory? (y/n): n
Do you approve the error report content? (y/n): y
Info:  Error report with GUID '64284117-40ba-4a09-b451-9c75c1754f74' compressed at 'C:\Users\AskUi\.askui\ErrorReports\ErrorReport-64284117-40ba-4a09-b451-9c75c1754f74-2025-04-18T12_27.zip'.
Info:  Submitting error report archive 'C:\Users\AskUi\.askui\ErrorReports\ErrorReport-64284117-40ba-4a09-b451-9c75c1754f74-2025-04-18T12_27.zip' ...
Info:  Error report was submitted successfully.

Send the ErrorReport-<uui>-<date>.zip from line Info: Submitting error report archive '... to info@askui.com or an AskUI employee.

Error: ButtonEvent down failed: Access is denied (Windows only)

The AskUI Agent OS may be unable to send button events (like left-clicks) to the screen in certain situations. This typically occurs when either the Lock Screen is active or the RDP client is minimized.

Lock Screen: If the machine is on the Lock Screen, AskUI cannot interact with it. This is because the lock screen is controlled by the SYSTEM process, which has higher privileges than the AskUI controller. As a result, clicking is no longer possible.

Remote Desktop (RDP): When using Remote Desktop (RDP), minimizing the RDP client on the local machine sets the remote session to an idle state, preventing any UI interactions. This is standard behavior.

To work around this RDP limitation, you can set the following registry key on the client machine (not the remote host):

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Terminal Server Client
DWORD RemoteDesktop_SuppressWhenMinimized 2

This change keeps the remote session active and allows AskUI to continue sending interactions, even when the RDP window is minimized.

AskUI Shell

How can I set `askui-shell` as a custom terminal in VSCode?

To configure VS Code to use a custom terminal like askui-shell, you can set the Shell path in the terminal settings. Here’s how to do it:

🔧 Steps to Set a Custom Terminal in Vscode

Open VS Code.
Go to File > Preferences > Settings (or press Ctrl+,).
Click the Open Settings (JSON) icon in the top right corner.
Paste the JSON below into your settings.json file.

{
    "terminal.integrated.profiles.windows": {
        "askui-shell": {
            "path": [
                "${env:ASKUI_INSTALLATION_DIRECTORY}\\Tools\\askui-shell.cmd"
            ],
            "icon": "robot",
            "overrideName": true,
            "color": "terminal.ansiMagenta"
        }
    },
    "terminal.integrated.profiles.osx": {
        "askui-shell": {
            "path": "/bin/zsh",
            "args": [
                "-l",
                "-c",
                "$ASKUI_INSTALLATION_DIRECTORY/Tools/askui-shell"
            ],
            "icon": "robot",
            "overrideName": true,
            "color": "terminal.ansiMagenta"
        }
    },
    "terminal.integrated.profiles.linux": {
        "askui-shell": {
            "path": "/bin/bash",
            "args": [
                "-l",
                "-c",
                "$ASKUI_INSTALLATION_DIRECTORY/Tools/askui-shell"
            ],
            "icon": "robot",
            "overrideName": true,
            "color": "terminal.ansiMagenta"
        }
    },
    "terminal.integrated.defaultProfile.windows": "askui-shell",
    "terminal.integrated.defaultProfile.linux": "askui-shell",
    "terminal.integrated.defaultProfile.osx": "askui-shell"
}

Save the file

How can I set `askui-shell` as a custom terminal in Pycharm?

To configure PyCharm to use a custom terminal like askui-shell, you can set the Shell path in the terminal settings. Here’s how to do it:

🔧 Steps to Set a Custom Terminal in PyCharm

Open Settings:
- Go to File > Settings (or PyCharm > Preferences on macOS).
Navigate to Terminal Settings:
- Go to Tools > Terminal.

Set the Shell Path:

In the Shell path field, enter the path to your askui-shell script.

Example for Windows:

"%ASKUI_INSTALLATION_DIRECTORY%\Tools\askui-shell.cmd"

Example for macOS/Linux:

/bin/zsh -l -c "$ASKUI_INSTALLATION_DIRECTORY/Tools/askui-shell"

Or if using bash:

/bin/bash -l -c "$ASKUI_INSTALLATION_DIRECTORY/Tools/askui-shell"

Apply and Restart Terminal:
- Click Apply and OK.
- Open a new terminal tab in PyCharm to see the changes.

5. Pricing Questions

What is an Ask?

An ask() refers to one action or inference call that AskUI performs. This involves taking a screenshot to localize an element on the screen, which is a key part of interacting with UI elements visually.

For example, if you have a test case where a user browses through a menu, like this:

click('nav bar')
click('home')
ask('is the home site visible?')

In this case, there are 3 asks or inference calls:

The first two click() commands do count as asks, as they require screenshots to localize the elements (even if it’s for interaction purposes).
The third action, ask('is the home site visible?'), also counts as one ask/inference call, since it involves taking a screenshot to verify if the element is visible.

However, simple mouse actions such as scroll(), leftclick(), or other actions that do not require localization or element recognition (i.e., no screenshot taken) do not count as asks.

How is Pricing for Ask Commands Calculated?

Each ask() call (which involves one screenshot) will be counted toward your usage, and the pricing is based on the total number of asks or inference calls made during the automation process.

How is the Act Command Priced?

The act() command is a more advanced action that involves multi-step workflows and typically requires interaction with external models such as Anthropic’s Claude Sonnet (or other models like OpenAI, etc.) for task execution. As such, act() is priced separately.
The pricing for act() commands is determined by the license agreement for the external model being used. For example, if you’re using Claude Sonnet by Anthropic, charges will be based on the number of calls made to the model during task execution. The pricing model can vary depending on the specific external model (e.g., OpenAI, Anthropic, or other AI providers).

6. Enterprise Questions

Is AskUI secure?

Yes. AskUI is designed with a strong focus on security, privacy, and compliance. All automation is executed locally within your own infrastructure, ensuring full control over your environment. Screenshots and UI instructions are processed securely in the cloud by our AI model, but are never stored—all processing is transient and limited strictly to the duration of the request.

Our cloud infrastructure is hosted within the European Union, supporting compliance with EU data protection laws. AskUI is ISO 27001 certified, fully GDPR compliant, and regularly undergoes independent penetration testing to ensure a secure and robust platform.

What are all the domains used by AskUI so I can whitelist them?

hub.askui.com
prod.askui.com
inference.askui.com
files.askui.com
tracking-cdn.askui.com
tracking.askui.com

Is AskUI hosting within the European Union?

Yes, all our hosting infrastructure is located within the European Union to ensure compliance with EU regulations. Also all data are stored within the European Union

Yes. AskUI complies with the General Data Protection Regulation (GDPR). If you utilize our native inference capabilities, all data processing aligns with GDPR standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).

Are you ISO-27001 compliant?

Yes, AskUI is ISO 27001 Compliant. Our Certificate is available under trust.askui.com. If you utilize our native inference capabilities, all data processing aligns with the standards. Please note that third-party models, such as those from Anthropic or other external LLM providers, are outside the scope of our Service Level Agreement (SLA).

Where can I find all the compliance documents?

You can find this here in our Trust Center at trust.askui.com.

Do you provide on-premise inference?

Yes, please contact our team for detailed information.

Typescript ADK

On this page

1. General Questions
What is AskUI?
What can I automate with AskUI?
Do I need to write code?
How does AskUI detect UI elements?
Can I use AskUI for automated testing?
What platforms are supported?
Where are the examples and templates?
2. Model Questions
What models is AskUI using?
What exactly is the AskUI UIDT-1 Model?
How does the Prompt-to-Action (PTA-1) work? – Single-Step Execution
What is a Large Action Models (LAM) – Multi-Step Execution
What is the difference between Teaching vs. Training?
Why Use Both ?
3. Set-Up Questions
How do I install AskUI?
Install AskUI Agent OS
4. Common Errors & Troubleshooting
Python Vision Agent
How can I debug a “Session info doesn’t match”-Error?
ADE / askui-shell
Mac Web Installer crashes with Killed: 9
How can I use dotenv instead of askui-shell for Access Token?
Agent OS
Send Error Report to AskUI
Error: ButtonEvent down failed: Access is denied (Windows only)
AskUI Shell
How can I set askui-shell as a custom terminal in VSCode?
How can I set askui-shell as a custom terminal in Pycharm?
5. Pricing Questions
What is an Ask?
How is Pricing for Ask Commands Calculated?
How is the Act Command Priced?
6. Enterprise Questions
Is AskUI secure?
What are all the domains used by AskUI so I can whitelist them?
Is AskUI hosting within the European Union?
Are you GDPR compliant?
Are you ISO-27001 compliant?
Where can I find all the compliance documents?
Do you provide on-premise inference?

Introduction

Getting Started

Core Concepts

Model Usage & Configuration

AskUI Suite

Integrations & Advanced Usage

Updates & Glossary

​1. General Questions

​What is AskUI?

​What can I automate with AskUI?

​Do I need to write code?

​How does AskUI detect UI elements?

​Can I use AskUI for automated testing?

​What platforms are supported?

​Where are the examples and templates?

​2. Model Questions

​What models is AskUI using?

​What exactly is the AskUI UIDT-1 Model?

​How does the Prompt-to-Action (PTA-1) work? – Single-Step Execution

​What is a Large Action Models (LAM) – Multi-Step Execution

​What is the difference between Teaching vs. Training?

​Why Use Both ?

​3. Set-Up Questions

​How do I install AskUI?

​Install AskUI Agent OS

​4. Common Errors & Troubleshooting

​Python Vision Agent

​How can I debug a “Session info doesn’t match”-Error?

​ADE / askui-shell

​Mac Web Installer crashes with Killed: 9

​How can I use dotenv instead of askui-shell for Access Token?

​Agent OS

​Send Error Report to AskUI

​Error: ButtonEvent down failed: Access is denied (Windows only)

​AskUI Shell

​How can I set askui-shell as a custom terminal in VSCode?

​How can I set askui-shell as a custom terminal in Pycharm?

​5. Pricing Questions

​What is an Ask?

​How is Pricing for Ask Commands Calculated?

​How is the Act Command Priced?

​6. Enterprise Questions

​Is AskUI secure?

​What are all the domains used by AskUI so I can whitelist them?

​Is AskUI hosting within the European Union?

​Are you GDPR compliant?

​Are you ISO-27001 compliant?

​Where can I find all the compliance documents?

​Do you provide on-premise inference?