Android Screen Auto-Screenshot with uiautomator2 MCP

Introduction

If you're managing multiple Android apps as a solo developer, you regularly run into tasks like these:

Re-capturing screenshots for the Play Store
UI regression checks
Dark mode verification
Multi-language layout breakage checks

Done manually, these require dozens of screenshots per app.

With uiautomator2 MCP, you can automate the entire process — controlling an Android device from Claude Code or Cursor to traverse and capture every screen automatically.

Core Strategy: UI Tree + DeepLink

There are two main approaches to automated screen traversal:

Approach	Stability	Implementation Cost
Traverse by coordinate taps	Low (breaks easily)	Low
Traverse by UI tree analysis	Medium	Medium
Navigate directly via DeepLink	High (stable)	Medium–High

Why image recognition alone is unreliable:

LazyColumn scroll position drifts
Tap misses during animations
Coordinates change on foldables and tablets
WebView internals aren't accessible

Modern Android automation agents follow the same pattern: "read the UI tree first, fall back to image analysis only when needed."

What Makes uiautomator2 MCP Stand Out

Unlike a simple ADB wrapper, uiautomator2 MCP can read the UI tree — and that's its biggest advantage.

<TextView text="Settings" resource-id="btn_settings" clickable="true"/>
<Button text="Save" resource-id="btn_save"/>

This lets AI receive natural language instructions like:

"Tap Settings"
"Find Save"
"List children of RecyclerView"

Why tanbro/uiautomator2-mcp-server Is a Strong Choice

It's one of the most complete Android automation MCP options available today.

Feature	Support
screenshot	✓
UI hierarchy retrieval	✓
XPath search	✓
App launch	✓
DeepLink launch	✓
Back key	✓
Swipe	✓
Tool filtering (expose only needed tools)	✓

Tool filtering is quietly important — too many MCP tools confuse LLMs, so being able to expose only what's needed is a practical design feature.

Capture Implementation Pattern

Ideal Architecture

App side: Screen Registry + DeepLink
  ↓
uiautomator2 MCP: start / capture / back
  ↓
Output: screens/SettingsScreen.png, etc.

In code:

for screen in registry:
    open_deeplink(screen.url)   # adb shell am start...
    wait_idle()                  # wait for UI to stabilize
    screenshot(screen.name)      # capture and save

Naming Convention for Captured Files

screens/
├── SettingsScreen.png
├── SettingsScreen_dark.png
├── SettingsScreen_ja.png
├── ProfileScreen.png
└── PurchaseDialog.png

Including screen name, theme, and locale in the filename makes comparison easy.

The Most Important Thing: Build a Screen Registry in Your App

Relying on MCP alone breaks on scroll positions, animations, and LazyColumn virtualization.

The critical investment is building a "dedicated capture path" inside the app itself.

Screen Registry Example (Kotlin)

object DebugScreens {
    val all = listOf(
        ScreenEntry("settings",  "myapp://debug/settings"),
        ScreenEntry("profile",   "myapp://debug/profile"),
        ScreenEntry("billing",   "myapp://debug/billing"),
    )
}

data class ScreenEntry(val name: String, val deepLink: String)

Use this Registry for three purposes:

Displaying the Debug Menu
Defining DeepLinks
Driving screenshot automation

Sharing it across all three keeps maintenance simple.

Why DeepLink Navigation Is Stable

adb shell am start \
  -a android.intent.action.VIEW \
  -d "myapp://debug/settings"

Navigating directly via DeepLink means:

No dependence on scroll position
No coordinate drift from tap misses
Works the same on foldables and tablets
Minimal animation wait time

This lets MCP focus purely on capture / wait / state verification.

Leveraging Compose testTag

For screens where DeepLink is difficult, you can tag elements with testTag and let uiautomator2 find them:

// Tag the capture target
LazyColumn(
    modifier = Modifier.testTag("debug_screen_list")
) {
    items(DebugScreens.all) { screen ->
        Text(
            text = screen.name,
            modifier = Modifier.testTag("debug_item_${screen.name}")
        )
    }
}

Most uiautomator2 MCPs can read resource-id, accessibility, and testTag. Standardizing testTag values dramatically improves operation stability.

Caveats

Where uiautomator2 Struggles

UI tree retrieval can be unreliable in these situations:

Situation	Problem
LazyColumn	Virtualization hides off-screen nodes
WebView	Internal tree is inaccessible
Canvas-rendered UI	No accessibility tree
During animations	State is unstable
RecyclerView virtualization	Off-screen elements can't be retrieved

Mitigation: confirm the screen has stabilized before acting (wait_idle() or activity_wait_appear).

Security Considerations

Some uiautomator2 MCP implementations can run adb shell commands and access files.

Use locally only — don't expose to external networks
Use an emulator
Isolate with a Work Profile

What Becomes Possible

With Screen Registry + DeepLink + uiautomator2 MCP in place:

Use case	Feasible?
Full-screen auto-capture	✓
Dark mode comparison	✓
Multi-language layout regression	✓
Tablet verification	✓
Play Store asset generation	✓
Claude-driven UI review	✓
UI regression diff detection	✓

Summary

Key Point	Detail
MCP-only exploration is unstable	Combine with DeepLink navigation for stability
App-side preparation is critical	Screen Registry + DeepLink is the most important investment
UI tree reading is the real advantage	Targeting elements beats coordinate tapping
Standardize testTag	Dramatically improves Compose operation accuracy
Keep MCP in the capture layer	Its role is capture / wait / state verification only

"Trying hard to explore screens through MCP" is far less effective than "building a Screen Registry + DeepLink in your app." MCP works best as the operation layer on top of that foundation.