AI自動化神器-DroidRun使用體驗

引言

DroidRun 是一個強大的框架，用于通過 LLM 代理控制 Android 設備。它允許您使用自然語言命令自動化 Android 設備交互。

特點

使用自然語言命令控制 Android 設備
支持多個 LLM 提供商(OpenAI、Anthropic、Gemini)
易于使用的 CLI
用于自定義自動化的可擴展 Python API
屏幕截圖分析，直觀了解設備

開始使用

前提條件

系統已安裝adb工具
通過 USB 或 ADB 通過 TCP/IP 連接 Android 設備或模擬器
Android或模擬器上已安裝DroidRun Portal 應用程序
至少一個受支持的 LLM 提供程序的 API 密鑰(OpenAI、Anthropic、Gemini)

新建項目

打開Pycharm新建項目droidrun-demo

設置環境

安裝依賴

在Pycharm里打開終端輸入命令：

pip install droidrun

安裝DroidRun Portal

下載link：https://github.com/droidrun/droidrun-portal

下載完成后使用adb命令安裝

adb install /Users/wan/Downloads/App/droidrun.apk

設置API key

在droidrun-demo目錄打開終端輸入命令

touch .env

我用的是Gemini，配置如下：

# Choose at least one of these based on your preferred provider
export GEMINI_API_KEY="***"

然后保存，輸入命令

source .env

編寫腳本

# coding=utf-8
import asyncio
import os
from droidrun.agent.react_agent import ReActAgent
from droidrun.agent.llm_reasoning import LLMReasoner
from dotenv import load_dotenv# Load environment variables from .env file
load_dotenv()async def main():# Create an LLM instancellm = LLMReasoner(llm_provider="gemini",  # Can be "openai", "anthropic", or "gemini"model_name="gemini-2.5-pro-preview-03-25",  # Choose appropriate model for your providerapi_key=os.environ.get("GEMINI_API_KEY"),  # Get API key from environmenttemperature=0.2,vision=True  # Enable vision capabilities)# Create and run the agentagent = ReActAgent(task="打開Chrome，搜索droidrun",llm=llm,vision=True  # Enable screenshot analysis)steps = await agent.run()print(f"Execution completed with {len(steps)} steps")if __name__ == "__main__":asyncio.run(main())

運行腳本

輸出結果

INFO:droidrun:Vision capabilities enabled: screenshot tool available
INFO:droidrun:Using provided LLM reasoner: provider=gemini, model=gemini-2.5-pro-preview-03-25
INFO:droidrun:Using first available device: emulator-5554
INFO:droidrun:Connected to device: emulator-5554
INFO:droidrun:🎯 GOAL: 打開Chrome，搜索droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0, 'api_calls': 0}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #1
INFO:droidrun:This call: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:Cumulative: 713 prompt + 225 completion = 938 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:droidrun:🤔 THOUGHT: I need to open the Chrome browser first. I will check the clickable elements on the current screen to see if the Chrome icon is present.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁? OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 713, 'completion_tokens': 225, 'total_tokens': 938, 'api_calls': 1}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #2
INFO:droidrun:This call: 785 prompt + 174 completion = 959 tokens
INFO:droidrun:Cumulative: 1498 prompt + 399 completion = 1897 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:droidrun:🤔 THOUGHT: The previous attempt to get clickable elements failed. This could be due to various reasons, possibly a temporary issue with the UI automation framework or the device state. To get a better understanding of the current screen and confirm if Chrome is visible, I will take a screenshot.
INFO:droidrun:🔄 ACTION: take_screenshot()
INFO:droidrun:Screenshot compressed successfully: 588.2KB → 56.8KB (90.3% reduction)
INFO:droidrun:👁? OBSERVATION: Screenshot captured and available for analysis
INFO:droidrun:Token usage before API call: {'prompt_tokens': 1498, 'completion_tokens': 399, 'total_tokens': 1897, 'api_calls': 2}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #3
INFO:droidrun:This call: 1139 prompt + 196 completion = 1335 tokens
INFO:droidrun:Cumulative: 2637 prompt + 595 completion = 3232 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:droidrun:🤔 THOUGHT: The screenshot confirms that the Chrome icon is visible on the home screen dock. I need to tap it to open the browser. I will attempt to get the clickable elements again to find the index for the Chrome icon.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁? OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 2637, 'completion_tokens': 595, 'total_tokens': 3232, 'api_calls': 3}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #4
INFO:droidrun:This call: 947 prompt + 312 completion = 1259 tokens
INFO:droidrun:Cumulative: 3584 prompt + 907 completion = 4491 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the Chrome icon's index for tapping. However, I know the goal is to open Chrome. A more robust way to achieve this, especially when UI interaction tools are unreliable, is to start the app directly using its package name. The package name for Google Chrome is 'com.android.chrome'. I will use the `start_app` tool to launch Chrome.
INFO:droidrun:🔄 ACTION: start_app(package=com.android.chrome)
INFO:droidrun:👁? OBSERVATION: Started com.android.chrome
INFO:droidrun:Token usage before API call: {'prompt_tokens': 3584, 'completion_tokens': 907, 'total_tokens': 4491, 'api_calls': 4}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #5
INFO:droidrun:This call: 1068 prompt + 144 completion = 1212 tokens
INFO:droidrun:Cumulative: 4652 prompt + 1051 completion = 5703 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:droidrun:🤔 THOUGHT: Chrome has been successfully launched. Now I need to find the search/address bar to input the search query 'droidrun'. I will use `get_clickables` to identify the elements on the current Chrome screen.
INFO:droidrun:🔄 ACTION: get_clickables()
ERROR:droidrun:Error executing tool get_clickables: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:👁? OBSERVATION: Error: Error getting clickable elements: Error retrieving clickable elements: Failed to find the JSON file path in logcat after 10 seconds
INFO:droidrun:Token usage before API call: {'prompt_tokens': 4652, 'completion_tokens': 1051, 'total_tokens': 5703, 'api_calls': 5}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #6
INFO:droidrun:This call: 1156 prompt + 980 completion = 2136 tokens
INFO:droidrun:Cumulative: 5808 prompt + 2031 completion = 7839 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:droidrun:🤔 THOUGHT: The `get_clickables` tool is consistently failing, preventing me from identifying the search bar in Chrome. However, when Chrome starts, the address bar is often focused by default. I will attempt to directly input the search query 'droidrun' assuming the address bar is already focused.
INFO:droidrun:🔄 ACTION: input_text(text=droidrun)
INFO:droidrun:👁? OBSERVATION: Text input completed: droidrun
INFO:droidrun:Token usage before API call: {'prompt_tokens': 5808, 'completion_tokens': 2031, 'total_tokens': 7839, 'api_calls': 6}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #7
INFO:droidrun:This call: 1241 prompt + 277 completion = 1518 tokens
INFO:droidrun:Cumulative: 7049 prompt + 2308 completion = 9357 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:droidrun:🤔 THOUGHT: I have successfully inputted the search query 'droidrun' into what I assume is the address/search bar in Chrome. The next logical step to perform the search is to press the Enter key. The keycode for Enter is 66. I will use the `press_key` tool to simulate this action.
INFO:droidrun:🔄 ACTION: press_key(keycode=66)
INFO:droidrun:👁? OBSERVATION: Pressed key 66
INFO:droidrun:Token usage before API call: {'prompt_tokens': 7049, 'completion_tokens': 2308, 'total_tokens': 9357, 'api_calls': 7}
INFO:httpx:HTTP Request: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions "HTTP/1.1 200 OK"
INFO:droidrun:===== Token Usage Statistics =====
INFO:droidrun:API Call #8
INFO:droidrun:This call: 1331 prompt + 300 completion = 1631 tokens
INFO:droidrun:Cumulative: 8380 prompt + 2608 completion = 10988 tokens
INFO:droidrun:=================================
INFO:droidrun:Token usage after API call: {'prompt_tokens': 8380, 'completion_tokens': 2608, 'total_tokens': 10988, 'api_calls': 8}
INFO:droidrun:🤔 THOUGHT: The sequence of actions indicates that Chrome was opened, the search term 'droidrun' was entered, and the Enter key was pressed. This completes the user's request to open Chrome and search for 'droidrun'. The search results page should now be displayed in Chrome. Therefore, the goal has been achieved.
INFO:droidrun:🔄 ACTION: complete(result=Opened Chrome and initiated a search for 'droidrun'.)
INFO:droidrun:👁? OBSERVATION: Task completed: Opened Chrome and initiated a search for 'droidrun'.
INFO:droidrun:👁? OBSERVATION: Goal achieved in 8 steps.===== Final Token Usage and Cost =====
Total Tokens Used: 10,988
Total API Calls: 8
Estimated Cost: $0.0011
===================================Summary: Task completed: Opened Chrome and initiated a search for 'droidrun'.
Execution completed with 26 steps

更多使用文檔參考：https://www.droidrun.ai/