引言
DroidRun 是一個強大的框架,用于通過 LLM 代理控制 Android 設備。它允許您使用自然語言命令自動化 Android 設備交互。
特點
-
使用自然語言命令控制 Android 設備
-
支持多個 LLM 提供商(OpenAI、Anthropic、Gemini)
-
易于使用的 CLI
-
用于自定義自動化的可擴展 Python API
-
屏幕截圖分析,直觀了解設備
開始使用
前提條件
-
系統已安裝adb工具
-
通過 USB 或 ADB 通過 TCP/IP 連接 Android 設備或模擬器
-
Android或模擬器上已安裝DroidRun Portal 應用程序
-
至少一個受支持的 LLM 提供程序的 API 密鑰(OpenAI、Anthropic、Gemini)
新建項目
打開Pycharm新建項目droidrun-demo
設置環境
安裝依賴
在Pycharm里打開終端輸入命令:
pip install droidrun
安裝DroidRun Portal
下載link:https://github.com/droidrun/droidrun-portal
下載完成后使用adb命令安裝
adb install /Users/wan/Downloads/App/droidrun.apk
設置API key
在droidrun-demo目錄打開終端輸入命令
touch .env
我用的是Gemini,配置如下:
# Choose at least one of these based on your preferred provider
export GEMINI_API_KEY="***"
然后保存,輸入命令
source .env
編寫腳本
# coding=utf-8
import asyncio
import os
from droidrun.agent.react_agent import ReActAgent
from droidrun.agent.llm_reasoning import LLMReasoner
from dotenv import load_dotenv# Load environment variables from .env file
load_dotenv()async def main():# Create an LLM instancellm = LLMReasoner(llm_provider="gemini", # Can be "openai", "anthropic", or "gemini"model_name="gemini-2.5-pro-preview-03-25", # Choose appropriate model for your providerapi_key=os.environ.get("GEMINI_API_KEY"), # Get API key from environmenttemperature=0.2,vision=True # Enable vision capabilities)# Create and run the agentagent = ReActAgent(task="打開Chrome,搜索droidrun",llm=llm,vision=True # Enable screenshot analysis)steps = await agent.run()print(f"Execution completed with {len(steps)} steps")if __name__ == "__main__":asyncio.run(main())
運行腳本
輸出結果
INFO:droidrun:Vision capabilities enabled: screenshot tool available
INFO:droidrun:Using provided LLM reasoner: provider=gemini, model=gemini-2.5-pro-preview-03-25
INFO:droidrun:Using first available device: emulator-5554
INFO:droidrun:Connected to device: emulator-5554
INFO:droidrun:🎯 GOAL: 打開Chrome,搜索droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0, 'api_calls': 0}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #1
INFO:droidrun:This call: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:Cumulative: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:droidrun:🤔 THOUGHT: I need to open the Chrome browser first. I will check the clickable elements on the current screen to see if the Chrome icon is present.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁? OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #2
INFO:droidrun:This call: 785 prompt + 174 completion = 959 tokens
INFO:droidrun:Cumulative: 1498 prompt + 399 completion = 1897 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:droidrun:🤔 THOUGHT: The previous attempt to get clickable elements failed. This could be due to various reasons, possibly a temporary issue with the UI automation framework or the device state. To get a better understanding of the current screen and confirm if Chrome is visible, I will take a screenshot.
INFO:droidrun:🔄 ACTION: take_screenshot()
INFO:droidrun:Screenshot compressed successfully: 588.2KB → 56.8KB (90.3% reduction)
INFO:droidrun:👁? OBSERVATION: Screenshot captured and available for analysis
INFO:droidrun:Token usage before API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #3
INFO:droidrun:This call: 1139 prompt + 196 completion = 1335 tokens
INFO:droidrun:Cumulative: 2637 prompt + 595 completion = 3232 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:droidrun:🤔 THOUGHT: The screenshot confirms that the Chrome icon is visible on the home screen dock. I need to tap it to open the browser. I will attempt to get the clickable elements again to find the index for the Chrome icon.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁? OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #4
INFO:droidrun:This call: 947 prompt + 312 completion = 1259 tokens
INFO:droidrun:Cumulative: 3584 prompt + 907 completion = 4491 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the Chrome icon's index for tapping. However, I know the goal is to open Chrome. A more robust way to achieve this, especially when UI interaction tools are unreliable, is to start the app directly using its package name. The package name for Google Chrome is 'com.android.chrome'. I will use the `start_app` tool to launch Chrome.
INFO:droidrun:🔄 ACTION: start_app(package=com.android.chrome)
INFO:droidrun:👁? OBSERVATION: Started com.android.chrome
INFO:droidrun:Token usage before API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #5
INFO:droidrun:This call: 1068 prompt + 144 completion = 1212 tokens
INFO:droidrun:Cumulative: 4652 prompt + 1051 completion = 5703 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:droidrun:🤔 THOUGHT: Chrome has been successfully launched. Now I need to find the search/address bar to input the search query 'droidrun'. I will use `get_clickables` to identify the elements on the current Chrome screen.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁? OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #6
INFO:droidrun:This call: 1156 prompt + 980 completion = 2136 tokens
INFO:droidrun:Cumulative: 5808 prompt + 2031 completion = 7839 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the search bar in Chrome. However, when Chrome starts, the address bar is often focused by default. I will attempt to directly input the search query 'droidrun' assuming the address bar is already focused.
INFO:droidrun:🔄 ACTION: input_text(text=droidrun)
INFO:droidrun:👁? OBSERVATION: Text input completed: droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #7
INFO:droidrun:This call: 1241 prompt + 277 completion = 1518 tokens
INFO:droidrun:Cumulative: 7049 prompt + 2308 completion = 9357 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:droidrun:🤔 THOUGHT: I have successfully inputted the search query 'droidrun' into what I assume is the address/search bar in Chrome. The next logical step to perform the search is to press the Enter key. The keycode for Enter is 66. I will use the `press_key` tool to simulate this action.
INFO:droidrun:🔄 ACTION: press_key(keycode=66)
INFO:droidrun:👁? OBSERVATION: Pressed key 66
INFO:droidrun:Token usage before API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #8
INFO:droidrun:This call: 1331 prompt + 300 completion = 1631 tokens
INFO:droidrun:Cumulative: 8380 prompt + 2608 completion = 10988 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 8380, 'completion_tokens': 2608, 'total_tokens': 10988, 'api_calls': 8}
INFO:droidrun:🤔 THOUGHT: The sequence of actions indicates that Chrome was opened, the search term 'droidrun' was entered, and the Enter key was pressed. This completes the user's request to open Chrome and search for 'droidrun'. The search results page should now be displayed in Chrome. Therefore, the goal has been achieved.
INFO:droidrun:🔄 ACTION: complete(result=Opened Chrome and initiated a search for 'droidrun'.)
INFO:droidrun:👁? OBSERVATION: Task completed: Opened Chrome and initiated a search for 'droidrun'.
INFO:droidrun:👁? OBSERVATION: Goal achieved in 8 steps.===== Final Token Usage and Cost =====
Total Tokens Used: 10,988
Total API Calls: 8
Estimated Cost: $0.0011
===================================Summary: Task completed: Opened Chrome and initiated a search for 'droidrun'.
Execution completed with 26 steps
更多使用文檔參考:https://www.droidrun.ai/
總結
體驗了一下droidrun,感覺一言難盡,相比傳統 App UI自動化來說:
1.執行測試的速度很慢,上面的腳本也才2個步驟
2.執行1次測試調用API會產生費用,Estimated Cost: $0.0011,而傳統UI自動化沒有這種成本,隨著腳本的復雜度提高,產生的成本也不便宜
3.從測試效果比較也不如傳統UI自動化測試
不知道其他LLM(OpenAI、Anthropic)的效果如何,支不支持DeepSeek,感興趣的同學可以自行測試效果!