Introduction
If you're doing Android development with Claude Code or Cursor, you'll hit this wall fast:
- Raw logcat dumps can be thousands of lines
- A UIAutomator XML for a single screen can be hundreds of KB
adb dumpsysoutput can run to tens of thousands of lines
If all you wanted was to reduce token count, regular gzip would be enough. But data sent to LLMs needs "semantic compression" — the goals are fundamentally different.
| Regular compression | Semantic compression for LLMs | |
|---|---|---|
| Goal | Reduce bytes | Reduce reasoning noise |
| Criterion | Reproducibility | AI comprehensibility |
| Output | Equivalent to original | Semantically equivalent |
Multi-Stage Pipeline Architecture
The basic architecture for semantic compression looks like this:
input
↓ parser (structuring)
↓ normalizer (normalize formatting variations)
↓ dedupe (remove duplicates)
↓ semantic reduction (remove irrelevant information)
↓ structuring (organize)
↓ encoder (TOON/DSL encoding)
↓ LLM
Let's walk through each step.
1. Parser — Structuring First
The first job is structuring the raw input.
Logcat example:
05-28 10:00:00 E MyApp: NullPointerException
↓
{
"time": "05-28 10:00:00",
"level": "E",
"tag": "MyApp",
"message": "NullPointerException"
}
Structuring enables filtering in downstream stages.
2. Normalizer — Eliminating Formatting Variations
Unify information that conveys the same meaning in different formats.
Shortening package names:
com.example.myapp.feature.login.LoginViewModel
↓
LoginViewModel
Normalizing exception class names:
java.lang.NullPointerException
↓
NPE
3. Dedupe — Removing Duplicates
Repetition is almost always harmful to LLMs.
Input:
Loading...
Loading...
Loading...
Loading...
After compression:
Loading... x4
A more significant example — the same stack trace repeated 100 times:
EXCEPTION repeated x100 { type=NPE source=LoginViewModel.kt line=42 }
This alone saves enormous numbers of tokens.
4. Semantic Reduction — Dropping What AI Doesn't Need
This is the core of LLM compression.
Android-Specific Noise Logs to Exclude
These don't contribute to AI debugging:
| Tag | Content |
|---|---|
BufferQueue |
Graphics system internals |
OpenGLRenderer |
Rendering engine |
libEGL |
OpenGL ES |
AudioTrack |
Audio internals |
TrafficStats |
Network statistics |
chatty |
Log suppression messages |
GC_ |
Garbage collection |
Filter example:
NOISE_PATTERNS = [
r"BufferQueue",
r"OpenGLRenderer",
r"libEGL",
r"AudioTrack",
r"chatty",
r"GC_",
]
def filter_noise(line: str) -> bool:
return not any(re.search(p, line) for p in NOISE_PATTERNS)
Important Logs to Keep
Exception / ANR / Activity lifecycle / Network error / Firebase / WorkManager
5. Structuring — Organizing for LLM Comprehension
LLMs handle organized information better.
Before:
MainActivity created
Fragment onResume called
API request started
After:
ACTIVITY{ name=MainActivity state=created }
FRAGMENT{ state=onResume }
API{ state=start }
Explicit labels significantly improve AI comprehension.
6. Encoder — TOON/DSL Encoding
Finally, convert to a custom format (TOON):
events[3]{type,target,state}:
ACT|MainActivity|created
API|LoginApi|start
ERR|Auth|401
More information per line, more content fitting within the context window.
UIAutomator and UI Tree Compression
When using Android MCP, UIAutomator XML is even more verbose.
Original XML (excerpt):
{
"x": 120,
"y": 400,
"width": 200,
"height": 48,
"padding": 0,
"alpha": 1.0,
"focusable": false,
"clickable": true,
"text": "Login"
}
After compression:
BTN[text=Login]
Padding and alpha are essentially useless to an LLM. Keeping only elements where clickable=true and text is present reduces the same information to under 1/20 the token count.
Domain Specialization Matters Most
Android-specific compression logic outperforms generic compression by a wide margin.
Automatic Log Category Classification
CATEGORIES = {
"lifecycle": ["onCreate", "onResume", "onPause", "onDestroy"],
"crash": ["Exception", "ANR", "Fatal"],
"network": ["Retrofit", "OkHttp", "HttpException"],
"database": ["Room", "SQLite"],
"compose": ["Recomposition", "Composition"],
"firebase": ["Firebase", "Firestore", "FCM"],
"workmanager": ["WorkManager", "Worker"],
}
Passing logs organized by category to AI groups related information together and improves debugging accuracy.
Priority Scoring for Context Control
priority=10 EXCEPTION / ANR
priority=8 Network error
priority=5 Lifecycle event
priority=3 Debug log
priority=1 Verbose
When context is limited, fill from highest priority first.
Command-Based Interface by Use Case
Slash commands create a natural interface for AI integration:
/logcat → logcat-specific compression
/uiauto → UIAutomator XML-specific compression
/json → large JSON-specific compression
/stacktrace → extract stacktrace only
Example:
/logcat
05-28 10:00:00 D MyApp: Loading
05-28 10:00:01 D MyApp: Loading
05-28 10:00:02 E MyApp: NullPointerException at LoginViewModel.kt:42
↓
LOG{
repeated[ "Loading" x2 ]
error{ type=NPE source=LoginViewModel.kt line=42 }
}
Going Further: MCP Server Implementation
This pipeline can be implemented as an MCP server:
@mcp.tool()
def compress_logcat(text: str) -> str:
return run_pipeline(text, mode="logcat")
@mcp.tool()
def compress_uiauto(xml: str) -> str:
return run_pipeline(xml, mode="uiauto")
@mcp.tool()
def compress_json(data: str) -> str:
return run_pipeline(data, mode="json")
Claude Desktop and Claude Code can then call these tools automatically. The ideal is compression that happens transparently, without the user needing to think about it.
Summary
| Step | Effect |
|---|---|
| Parser | Enables downstream filtering |
| Normalizer | Eliminates duplicate representations of the same information |
| Dedupe | Collapses repeated log entries |
| Semantic Reduction | Removes noise irrelevant to AI |
| Structuring | Organizes data for better AI comprehension |
| Encoder (TOON) | Increases information density |
The key insight is knowing what AI doesn't need. Regular compression reduces bytes. Semantic compression for LLMs reduces reasoning noise. Keeping that distinction in mind makes the Android MCP + AI combination dramatically more powerful.