Quick Reference

PlatformAvailable Models
AskUIaskui-combo, askui-pta, askui-ocr, askui-ai-element
Anthropicanthropic-claude-3-5-sonnet-20241022
Hugging FaceAskUI/PTA-1, OS-Copilot/OS-Atlas-Base-7B, showlab/ShowUI-2B, Qwen/Qwen2-VL-2B-Instruct, Qwen/Qwen2-VL-7B-Instruct
Self-HostedUI-Tars

Using Different Models

AskUI allows you to specify which model to use for each command by passing the model_name parameter. This gives you flexibility to choose the most appropriate model for each specific task.

Basic Usage

To use a specific model for a command, add the model_name parameter:

agent.click("search field", model_name="OS-Copilot/OS-Atlas-Base-7B")

Authenticate with an AI Model Provider

Before you can use different models, you need to authenticate with an AI Model Provider.

ProviderAskUIAnthropic
ENV VariablesASKUI_WORKSPACE_ID, ASKUI_TOKENANTHROPIC_API_KEY
Supported Commandsclick()click(), get(), act()
DescriptionFaster Inference, European Server, Enterprise ReadySupports complex actions

To get started, set the environment variables required to authenticate with your chosen model provider.

How to set an environment variable?

Setting Environment Variables

Environment variables are used to securely store API keys and other sensitive information. Here’s how to set them on different operating systems:

Linux & macOS

Use the export command in your terminal:

export ANTHROPIC_API_KEY=<your-api-key-here>

Windows PowerShell

Set an environment variable with $env::

$env:ANTHROPIC_API_KEY="<your-api-key-here>"

Anthropic AI Models

Supported commands are: click(), type(), mouse_move(), get(), act()

Model NameInfoExecution SpeedSecurityCostReliability
anthropic-claude-3-5-sonnet-20241022The Computer Use model from Anthropic is a Large Action Model (LAM), which can autonomously achieve goals. e.g. “Book me a flight from Berlin to Rom”Slow, 1s per stepModel hosting by AnthropicHigh, up to 1,5$ per actNot recommended for production usage

AskUI AI Models

Supported commands are: click(), type(), mouse_move()

Model NameInfoExecution SpeedSecurityCostReliability
askui-ptaPTA-1 (Prompt-to-Automation) is a vision language model (VLM) trained by AskUI which to address all kinds of UI elements by a textual description e.g. “Login button”, “Text login”Fast, 500ms per stepSecure hosting by AskUI or on-premiseLow, 0.05$ per stepRecommended for production usage, can be retrained
askui-ocrAskUI OCR is an OCR model trained to address texts on UI Screens e.g. “Login”, “Search”Fast, 500ms per stepSecure hosting by AskUI or on-premiseLow, 0.05$ per stepRecommended for production usage, can be retrained
askui-comboAskUI Combo is an combination from the askui-pta and the askui-ocr model to improve the accuracy.Fast, 500ms per stepSecure hosting by AskUI or on-premiseLow, 0.05$ per stepRecommended for production usage, can be retrained
askui-ai-elementAskUI AI Element allows you to address visual elements like icons or images by demonstrating what you looking for. Therefore, you have to crop out the element and give it a name.Very fast, 5ms per stepSecure hosting by AskUI or on-premiseLow, 0.05$ per stepRecommended for production usage, determinitic behaviour

Huggingface AI Models (Spaces API)

Supported commands are: click(), type(), mouse_move()

Model NameInfoExecution SpeedSecurityCostReliability
AskUI/PTA-1PTA-1 (Prompt-to-Automation) is a vision language model (VLM) trained by AskUI which to address all kinds of UI elements by a textual description e.g. “Login button”, “Text login”Fast, 500ms per stepHuggingface hostedPrices for Huggingface hostingNot recommended for production applications
OS-Copilot/OS-Atlas-Base-7BOS-Atlas-Base-7B is a Large Action Model (LAM), which can autonomously achieve goals. e.g. “Please help me modify VS Code settings to hide all folders in the explorer view”. This model is not available in the act() commandSlow, 1s per stepHuggingface hostedPrices for Huggingface hostingNot recommended for production applications
showlab/ShowUI-2Bshowlab/ShowUI-2B is a Large Action Model (LAM), which can autonomously achieve goals. e.g. “Search in google maps for Nahant”. This model is not available in the act() commandSlow, 1s per stepHuggingface hostedPrices for Huggingface hostingNot recommended for production usage
Qwen/Qwen2-VL-2B-InstructQwen/Qwen2-VL-2B-Instruct is a Visual Language Model (VLM) pre-trained on multiple datasets including UI data. This model is not available in the act() commandSlow, 1s per stepHuggingface hostedPrices for Huggingface hostingNot recommended for production usage
Qwen/Qwen2-VL-7B-InstructQwen/Qwen2-VL-7B-Instruct is a Visual Language Model (VLM) pre-trained on multiple dataset including UI data. This model is not available in the act()` command availableSlow, 1s per stepHuggingface hostedPrices for Huggingface hostingNot recommended for production usage

Note: No authentication required! But rate-limited!

Self Hosted UI Models

Supported commands are: click(), type(), mouse_move(), get(), act()

Model NameInfoExecution SpeedSecurityCostReliability
UI-TarsUI-Tars is a Large Action Model (LAM) based on Qwen2 and fine-tuned by ByteDance on UI data e.g. “Book me a flight to rom”Slow, 1s per stepSelf-hostedDepening on infrastructureOut-of-the-box not recommended for production usage

Note: These models need to been self hosted by yourself.