Introduction
If you're managing multiple Android apps as a solo developer, you regularly run into tasks like these:
- Re-capturing screenshots for the Play Store
- UI regression checks
- Dark mode verification
- Multi-language layout breakage checks
Done manually, these require dozens of screenshots per app.
With uiautomator2 MCP, you can automate the entire process — controlling an Android device from Claude Code or Cursor to traverse and capture every screen automatically.
Core Strategy: UI Tree + DeepLink
There are two main approaches to automated screen traversal:
| Approach | Stability | Implementation Cost |
|---|---|---|
| Traverse by coordinate taps | Low (breaks easily) | Low |
| Traverse by UI tree analysis | Medium | Medium |
| Navigate directly via DeepLink | High (stable) | Medium–High |
Why image recognition alone is unreliable:
- LazyColumn scroll position drifts
- Tap misses during animations
- Coordinates change on foldables and tablets
- WebView internals aren't accessible
Modern Android automation agents follow the same pattern: "read the UI tree first, fall back to image analysis only when needed."
What Makes uiautomator2 MCP Stand Out
Unlike a simple ADB wrapper, uiautomator2 MCP can read the UI tree — and that's its biggest advantage.
<TextView text="Settings" resource-id="btn_settings" clickable="true"/>
<Button text="Save" resource-id="btn_save"/>
This lets AI receive natural language instructions like:
- "Tap Settings"
- "Find Save"
- "List children of RecyclerView"
Why tanbro/uiautomator2-mcp-server Is a Strong Choice
It's one of the most complete Android automation MCP options available today.
| Feature | Support |
|---|---|
| screenshot | ✓ |
| UI hierarchy retrieval | ✓ |
| XPath search | ✓ |
| App launch | ✓ |
| DeepLink launch | ✓ |
| Back key | ✓ |
| Swipe | ✓ |
| Tool filtering (expose only needed tools) | ✓ |
Tool filtering is quietly important — too many MCP tools confuse LLMs, so being able to expose only what's needed is a practical design feature.
Capture Implementation Pattern
Ideal Architecture
App side: Screen Registry + DeepLink
↓
uiautomator2 MCP: start / capture / back
↓
Output: screens/SettingsScreen.png, etc.
In code:
for screen in registry:
open_deeplink(screen.url) # adb shell am start...
wait_idle() # wait for UI to stabilize
screenshot(screen.name) # capture and save
Naming Convention for Captured Files
screens/
├── SettingsScreen.png
├── SettingsScreen_dark.png
├── SettingsScreen_ja.png
├── ProfileScreen.png
└── PurchaseDialog.png
Including screen name, theme, and locale in the filename makes comparison easy.
The Most Important Thing: Build a Screen Registry in Your App
Relying on MCP alone breaks on scroll positions, animations, and LazyColumn virtualization.
The critical investment is building a "dedicated capture path" inside the app itself.
Screen Registry Example (Kotlin)
object DebugScreens {
val all = listOf(
ScreenEntry("settings", "myapp://debug/settings"),
ScreenEntry("profile", "myapp://debug/profile"),
ScreenEntry("billing", "myapp://debug/billing"),
)
}
data class ScreenEntry(val name: String, val deepLink: String)
Use this Registry for three purposes:
- Displaying the Debug Menu
- Defining DeepLinks
- Driving screenshot automation
Sharing it across all three keeps maintenance simple.
Why DeepLink Navigation Is Stable
adb shell am start \
-a android.intent.action.VIEW \
-d "myapp://debug/settings"
Navigating directly via DeepLink means:
- No dependence on scroll position
- No coordinate drift from tap misses
- Works the same on foldables and tablets
- Minimal animation wait time
This lets MCP focus purely on capture / wait / state verification.
Leveraging Compose testTag
For screens where DeepLink is difficult, you can tag elements with testTag and let uiautomator2 find them:
// Tag the capture target
LazyColumn(
modifier = Modifier.testTag("debug_screen_list")
) {
items(DebugScreens.all) { screen ->
Text(
text = screen.name,
modifier = Modifier.testTag("debug_item_${screen.name}")
)
}
}
Most uiautomator2 MCPs can read resource-id, accessibility, and testTag. Standardizing testTag values dramatically improves operation stability.
Caveats
Where uiautomator2 Struggles
UI tree retrieval can be unreliable in these situations:
| Situation | Problem |
|---|---|
| LazyColumn | Virtualization hides off-screen nodes |
| WebView | Internal tree is inaccessible |
| Canvas-rendered UI | No accessibility tree |
| During animations | State is unstable |
| RecyclerView virtualization | Off-screen elements can't be retrieved |
Mitigation: confirm the screen has stabilized before acting (wait_idle() or activity_wait_appear).
Security Considerations
Some uiautomator2 MCP implementations can run adb shell commands and access files.
- Use locally only — don't expose to external networks
- Use an emulator
- Isolate with a Work Profile
What Becomes Possible
With Screen Registry + DeepLink + uiautomator2 MCP in place:
| Use case | Feasible? |
|---|---|
| Full-screen auto-capture | ✓ |
| Dark mode comparison | ✓ |
| Multi-language layout regression | ✓ |
| Tablet verification | ✓ |
| Play Store asset generation | ✓ |
| Claude-driven UI review | ✓ |
| UI regression diff detection | ✓ |
Summary
| Key Point | Detail |
|---|---|
| MCP-only exploration is unstable | Combine with DeepLink navigation for stability |
| App-side preparation is critical | Screen Registry + DeepLink is the most important investment |
| UI tree reading is the real advantage | Targeting elements beats coordinate tapping |
| Standardize testTag | Dramatically improves Compose operation accuracy |
| Keep MCP in the capture layer | Its role is capture / wait / state verification only |
"Trying hard to explore screens through MCP" is far less effective than "building a Screen Registry + DeepLink in your app." MCP works best as the operation layer on top of that foundation.