Android Screen Auto-Screenshot with uiautomator2 MCP

Android Screen Auto-Screenshot with uiautomator2 MCP

Introduction

If you're managing multiple Android apps as a solo developer, you regularly run into tasks like these:

  • Re-capturing screenshots for the Play Store
  • UI regression checks
  • Dark mode verification
  • Multi-language layout breakage checks

Done manually, these require dozens of screenshots per app.

With uiautomator2 MCP, you can automate the entire process — controlling an Android device from Claude Code or Cursor to traverse and capture every screen automatically.


Core Strategy: UI Tree + DeepLink

There are two main approaches to automated screen traversal:

Approach Stability Implementation Cost
Traverse by coordinate taps Low (breaks easily) Low
Traverse by UI tree analysis Medium Medium
Navigate directly via DeepLink High (stable) Medium–High

Why image recognition alone is unreliable:

  • LazyColumn scroll position drifts
  • Tap misses during animations
  • Coordinates change on foldables and tablets
  • WebView internals aren't accessible

Modern Android automation agents follow the same pattern: "read the UI tree first, fall back to image analysis only when needed."


What Makes uiautomator2 MCP Stand Out

Unlike a simple ADB wrapper, uiautomator2 MCP can read the UI tree — and that's its biggest advantage.

<TextView text="Settings" resource-id="btn_settings" clickable="true"/>
<Button text="Save" resource-id="btn_save"/>

This lets AI receive natural language instructions like:

  • "Tap Settings"
  • "Find Save"
  • "List children of RecyclerView"

Why tanbro/uiautomator2-mcp-server Is a Strong Choice

It's one of the most complete Android automation MCP options available today.

Feature Support
screenshot
UI hierarchy retrieval
XPath search
App launch
DeepLink launch
Back key
Swipe
Tool filtering (expose only needed tools)

Tool filtering is quietly important — too many MCP tools confuse LLMs, so being able to expose only what's needed is a practical design feature.


Capture Implementation Pattern

Ideal Architecture

App side: Screen Registry + DeepLink
  ↓
uiautomator2 MCP: start / capture / back
  ↓
Output: screens/SettingsScreen.png, etc.

In code:

for screen in registry:
    open_deeplink(screen.url)   # adb shell am start...
    wait_idle()                  # wait for UI to stabilize
    screenshot(screen.name)      # capture and save

Naming Convention for Captured Files

screens/
├── SettingsScreen.png
├── SettingsScreen_dark.png
├── SettingsScreen_ja.png
├── ProfileScreen.png
└── PurchaseDialog.png

Including screen name, theme, and locale in the filename makes comparison easy.


The Most Important Thing: Build a Screen Registry in Your App

Relying on MCP alone breaks on scroll positions, animations, and LazyColumn virtualization.

The critical investment is building a "dedicated capture path" inside the app itself.

Screen Registry Example (Kotlin)

object DebugScreens {
    val all = listOf(
        ScreenEntry("settings",  "myapp://debug/settings"),
        ScreenEntry("profile",   "myapp://debug/profile"),
        ScreenEntry("billing",   "myapp://debug/billing"),
    )
}

data class ScreenEntry(val name: String, val deepLink: String)

Use this Registry for three purposes:

  • Displaying the Debug Menu
  • Defining DeepLinks
  • Driving screenshot automation

Sharing it across all three keeps maintenance simple.


Why DeepLink Navigation Is Stable

adb shell am start \
  -a android.intent.action.VIEW \
  -d "myapp://debug/settings"

Navigating directly via DeepLink means:

  • No dependence on scroll position
  • No coordinate drift from tap misses
  • Works the same on foldables and tablets
  • Minimal animation wait time

This lets MCP focus purely on capture / wait / state verification.


Leveraging Compose testTag

For screens where DeepLink is difficult, you can tag elements with testTag and let uiautomator2 find them:

// Tag the capture target
LazyColumn(
    modifier = Modifier.testTag("debug_screen_list")
) {
    items(DebugScreens.all) { screen ->
        Text(
            text = screen.name,
            modifier = Modifier.testTag("debug_item_${screen.name}")
        )
    }
}

Most uiautomator2 MCPs can read resource-id, accessibility, and testTag. Standardizing testTag values dramatically improves operation stability.


Caveats

Where uiautomator2 Struggles

UI tree retrieval can be unreliable in these situations:

Situation Problem
LazyColumn Virtualization hides off-screen nodes
WebView Internal tree is inaccessible
Canvas-rendered UI No accessibility tree
During animations State is unstable
RecyclerView virtualization Off-screen elements can't be retrieved

Mitigation: confirm the screen has stabilized before acting (wait_idle() or activity_wait_appear).

Security Considerations

Some uiautomator2 MCP implementations can run adb shell commands and access files.

  • Use locally only — don't expose to external networks
  • Use an emulator
  • Isolate with a Work Profile

What Becomes Possible

With Screen Registry + DeepLink + uiautomator2 MCP in place:

Use case Feasible?
Full-screen auto-capture
Dark mode comparison
Multi-language layout regression
Tablet verification
Play Store asset generation
Claude-driven UI review
UI regression diff detection

Summary

Key Point Detail
MCP-only exploration is unstable Combine with DeepLink navigation for stability
App-side preparation is critical Screen Registry + DeepLink is the most important investment
UI tree reading is the real advantage Targeting elements beats coordinate tapping
Standardize testTag Dramatically improves Compose operation accuracy
Keep MCP in the capture layer Its role is capture / wait / state verification only

"Trying hard to explore screens through MCP" is far less effective than "building a Screen Registry + DeepLink in your app." MCP works best as the operation layer on top of that foundation.