In this tutorial, you’ll learn how to automate Android applications using AskUI’s specialized AndroidVisionAgent. You’ll build an agent that can interact with Android apps, manage device connections, and perform platform-specific operations.
Prerequisites: Make sure you’ve completed the installation and Your First Agent tutorial before starting this one.

What You’ll Build

You’ll create an Android agent that:
  1. Connects to an Android device or emulator
  2. Launches and interacts with Android applications
  3. Performs touch interactions like tapping and swiping
  4. Extracts data from the Android UI
  5. Uses Android-specific shell commands and key events

Android Setup Requirements

Before building your agent, ensure you have:
  • Android Debug Bridge (ADB) installed and configured in your system’s PATH. ADB is the command-line tool that lets your computer communicate with an Android device. You can download it from the Android SDK Platform Tools.
  • An Android device with USB debugging enabled (learn how to enable it in Developer Options) or an Android emulator running
  • The python-ppadb library, which is automatically installed as a dependency of askui

Building Your Android Agent

1

Verify Device Connection

First, verify that your Android device is connected and accessible via ADB. Open your terminal and run:
adb devices
You should see your device listed with its serial number. If not, check your USB connection, ensure USB debugging is enabled on the device, or make sure your emulator is running.
# Example output
List of devices attached
emulator-5554	device
2

Create Your Android Agent Script

Create a new Python file named android_automation.py and add the following code. This script will open the Android Settings app, navigate to the network settings, and then return to the home screen.
from askui import AndroidVisionAgent
import logging
from askui.reporting import SimpleHtmlReporter

# Initialize your Android agent with logging and reporting
with AndroidVisionAgent(
    reporters=[SimpleHtmlReporter()]
) as agent:
    # The agent automatically connects to the first available device.
    # To select a specific device if multiple are connected, use:
    # agent.set_device_by_serial_number("<your-device-serial>")

    agent.act("Open a browser and search for AskUI")

    agent.key_tap('HOME')
Run the script from your terminal:
python android_automation.py
The agent will perform the steps on your connected Android device, and you’ll see the status printed to your console. A detailed HTML report will be generated in a reports folder.
3

Understanding AndroidVisionAgent

Let’s break down the key API methods used in the script:

Android Agent Initialization

with AndroidVisionAgent(...) as agent:
  • This creates an instance of the agent specifically for Android
  • Using the with statement ensures that the connection to the device is automatically managed (connected on entry, disconnected on exit)
  • The AndroidVisionAgent is a specialized agent type designed for mobile automation

Shell Commands

agent.shell("am start -n com.android.settings/.Settings")
  • The shell() method is a powerful tool for executing any ADB shell command. This is the primary way to manage applications, such as starting them with am start or stopping them with am force-stop

Touch Interactions

agent.tap("Network & internet")
  • tap() is the method for performing touch clicks on the screen. It can take a text description, a locator, or coordinates

Hardware Key Events

agent.key_tap('BACK')
agent.key_tap('HOME')
  • key_tap() simulates pressing hardware or special keys. It accepts any key from the ANDROID_KEY literal type, including BACK, HOME, VOLUME_UP, and ENTER

Platform-Specific Features

Device Selection

If you have multiple Android devices connected, you can target a specific one by its serial number. You can get the serial number from the adb devices command.
with AndroidVisionAgent() as agent:
    # List devices with `adb devices` first to get the serial
    agent.set_device_by_serial_number("emulator-5554")
    
    # All subsequent commands will target this device
    agent.key_tap("HOME")

Application Management via Shell

While there are no direct app.* methods, you can manage apps effectively using agent.shell():
# Check if an app is installed
output = agent.shell("pm list packages | grep com.example.app")
is_installed = "com.example.app" in output

# Force stop an app
agent.shell("am force-stop com.example.app")

# Clear app data (use with caution!)
agent.shell("pm clear com.example.app")

Android Gestures

The agent provides built-in methods for common gestures:
# Swipe from (x1, y1) to (x2, y2) over 1 second
agent.swipe(100, 800, 100, 200, duration_in_ms=1000)

# Drag an element from one point to another
agent.drag_and_drop(200, 300, 600, 300, duration_in_ms=1500)

Troubleshooting Android Automation

What You’ve Learned

Congratulations! You’ve successfully learned to:
  • ✅ Set up Android automation with AndroidVisionAgent
  • ✅ Connect to and target specific Android devices
  • ✅ Control Android applications using shell commands
  • ✅ Perform taps, swipes, and other gestures
  • ✅ Use hardware key events for navigation

Next Steps