Prerequisites: Make sure you’ve completed the installation and Your First Agent tutorial before starting this one.
What You’ll Build
You’ll create an Android agent that:- Connects to an Android device or emulator
- Launches and interacts with Android applications
- Performs touch interactions like tapping and swiping
- Extracts data from the Android UI
- Uses Android-specific shell commands and key events
Android Setup Requirements
Before building your agent, ensure you have:- Android Debug Bridge (ADB) installed and configured in your system’s PATH. ADB is the command-line tool that lets your computer communicate with an Android device. You can download it from the Android SDK Platform Tools.
- An Android device with USB debugging enabled (learn how to enable it in Developer Options) or an Android emulator running
- The python-ppadb library, which is automatically installed as a dependency of askui
Building Your Android Agent
1
Verify Device Connection
First, verify that your Android device is connected and accessible via ADB. Open your terminal and run:You should see your device listed with its serial number. If not, check your USB connection, ensure USB debugging is enabled on the device, or make sure your emulator is running.
2
Create Your Android Agent Script
Create a new Python file named Run the script from your terminal:The agent will perform the steps on your connected Android device, and you’ll see the status printed to your console. A detailed HTML report will be generated in a
android_automation.py
and add the following code. This script will open the Android Settings app, navigate to the network settings, and then return to the home screen.reports
folder.3
Understanding AndroidVisionAgent
Let’s break down the key API methods used in the script:
Android Agent Initialization
- This creates an instance of the agent specifically for Android
- Using the
with
statement ensures that the connection to the device is automatically managed (connected on entry, disconnected on exit) - The AndroidVisionAgent is a specialized agent type designed for mobile automation
Shell Commands
- The
shell()
method is a powerful tool for executing any ADB shell command. This is the primary way to manage applications, such as starting them witham start
or stopping them witham force-stop
Touch Interactions
tap()
is the method for performing touch clicks on the screen. It can take a text description, a locator, or coordinates
Hardware Key Events
key_tap()
simulates pressing hardware or special keys. It accepts any key from theANDROID_KEY
literal type, includingBACK
,HOME
,VOLUME_UP
, andENTER
Platform-Specific Features
Device Selection
If you have multiple Android devices connected, you can target a specific one by its serial number. You can get the serial number from theadb devices
command.
Application Management via Shell
While there are no direct app.* methods, you can manage apps effectively usingagent.shell()
:
Android Gestures
The agent provides built-in methods for common gestures:Troubleshooting Android Automation
Device not detected
Device not detected
- Ensure USB debugging is enabled in your device’s Developer Options.
- Check if ADB drivers are installed correctly on your computer.
- Run
adb kill-server
and thenadb start-server
in your terminal. - Verify your device appears in the output of
adb devices
.
App won't launch
App won't launch
- Double-check that the package name (e.g.,
com.android.settings
) is correct. - Ensure the app is installed on the target device.
- Make sure the device screen is unlocked.
Elements not found
Elements not found
- Open the report generated from SimpleHTMLReporter and check if if the element is visiable on the screenshot.
- Add
agent.wait()
calls to give the UI time to load or animate. - If an element is off-screen, use
agent.swipe()
to scroll it into view. - Make your text prompts more specific. For example, instead of
"button"
, try"blue login button"
.
What You’ve Learned
Congratulations! You’ve successfully learned to:- ✅ Set up Android automation with AndroidVisionAgent
- ✅ Connect to and target specific Android devices
- ✅ Control Android applications using shell commands
- ✅ Perform taps, swipes, and other gestures
- ✅ Use hardware key events for navigation
Next Steps
Locating Elements
Dive deeper into different ways of selecting UI elements.
Extracting Information
Learn how to extract structured and unstructured data from the screen.
Agentic Automation
Discover how to give the agent a high-level goal and let it figure out the steps.
Troubleshooting Guide
Find solutions to common automation challenges.