[LLM-Agents]淺析Agent工具使用框架：MM-ReAct

上文LLM-Agents]詳解Agent中工具使用Workflow提到MM-ReAct框架，通過結合ChatGPT 與視覺專家模型來解決復雜的視覺理解任務的框架。通過設計文本提示（prompt design），使得語言模型能夠接受、關聯和處理多模態信息，如圖像和視頻。展示了 MM-REACT 在不同場景下處理高級視覺理解任務的有效性，如多圖像推理、多跳文檔理解、視頻摘要和事件定位等。今天我們嘗試安裝使用一下，了解一下在LLM中如何使用工具。

1. 安裝

1.1下載工程

git clone https://github.com/microsoft/MM-REACT

1.2 安裝依賴

MM-ReAct是使用Poetry解決依賴包，所以除了安裝poetry，還需要額外安裝pillow、imagesize 和openai。其中openai需要限制版本為0.28，否則會有兼容性問題。

bash
復制代碼
curl -sSL https://install.python-poetry.org | python3 -
subl ~/.zshrc
export PATH="/Users/xxxx/.local/bin:$PATH"
source ~/.zshrc
pip install pillow imagesize
pip install openai==0.28

1.3 設置環境變量

因為該Repo使用了大量的Microsoft的云端API，需要注冊運行，此處為了了解運行過程，就不注冊了。但為了能夠基本運行，依然需要設置一些無效的環境變量。

bash
復制代碼
BING_SEARCH_URL="https://api.bing.microsoft.com/v7.0/search";
BING_SUBSCRIPTION_KEY=xxxx;
IMUN_CELEB_PARAMS=xxxx;
IMUN_CELEB_URL="https://yourazureendpoint.cognitiveservices.azure.com/vision/v3.2/models/celebrities/analyze";
IMUN_OCR_BC_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-businessCard:analyze";
IMUN_OCR_INVOICE_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-invoice:analyze";
IMUN_OCR_LAYOUT_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze";
IMUN_OCR_PARAMS="api-version=2022-08-31";
IMUN_OCR_READ_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze";
IMUN_OCR_RECEIPT_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-receipt:analyze";
IMUN_OCR_SUBSCRIPTION_KEY=xxx;
IMUN_PARAMS="visualFeatures=Tags,Objects,Faces";
IMUN_PARAMS2="api-version=2023-02-01-preview&model-version=latest&features=denseCaptions";
IMUN_SUBSCRIPTION_KEY=xxxx;
IMUN_SUBSCRIPTION_KEY2=xxxx;
IMUN_URL="https://yourazureendpoint.cognitiveservices.azure.com/vision/v3.2/analyze";
IMUN_URL2="https://yourazureendpoint.cognitiveservices.azure.com/computervision/imageanalysis:analyze"

2. 運行

為了使用本地安裝的大模型，需要修改兩個文件。

langchain/llms/openai.py
sample.py

2.1 修改sample.py

替換代碼中的AzureOpenAI為OpenAI，包括import。

python
復制代碼
llm = OpenAI(model_name="gpt-3.5-turbo", chat_completion=True,openai_api_base="http://localhost:11434/v1",openai_api_key="sk", temperature=0, max_tokens=MAX_TOKENS,openai_log="debug")

2.2 修改langchain/llms/openai.py

由于自帶的langchain中，可能版本比較老，不支持設置openai_api_base ，因此需要增加一點配置代碼。

bash
復制代碼
加一點配置代碼。
diff --git a/langchain/llms/openai.py b/langchain/llms/openai.py
index 4180165..70711c1 100644
--- a/langchain/llms/openai.py
+++ b/langchain/llms/openai.py
@@ -115,6 +115,8 @@ class BaseOpenAI(BaseLLM, BaseModel):"""Whether to stream the results or not."""chat_completion: bool = False"""Whether to use the chat client"""
+    openai_api_base: str = ""
+    openai_log: str = "debug"class Config:"""Configuration for this pydantic object."""
@@ -146,7 +148,9 @@ class BaseOpenAI(BaseLLM, BaseModel):openai_api_key = get_from_dict_or_env(values, "openai_api_key", "OPENAI_API_KEY")
-        openai_api_version = values.get("openai_api_version") or os.environ.get("OPENAI_API_VERSION") 
+        openai_api_version = values.get("openai_api_version") or os.environ.get("OPENAI_API_VERSION")
+        openai_api_base = values.get("openai_api_base") or os.environ.get("OPENAI_API_BASE")
+        openai_log = values.get("openai_log") or os.environ.get("OPENAI_LOG")chat_completion = values.get("chat_completion") or Falsevalues["chat_completion"] = chat_completiontry:
@@ -155,6 +159,10 @@ class BaseOpenAI(BaseLLM, BaseModel):openai.api_key = openai_api_keyif openai_api_version:openai.api_version = openai_api_version
+            if openai_api_base:
+                openai.api_base = openai_api_base
+            if openai_log:
+                openai.log = openai_logif chat_completion:values["client"] = openai.ChatCompletionelse:

2.3 運行

代碼運行入口為sample.py本身較為簡單，初始化OpenAI，Tool，Agent和開始對話。可以看到除了定義一堆Azure Cloud的工具之外，還自定義了一個edit_photo。

python
復制代碼
def edit_photo(query: str) -> str:....return "Here is the edited image " + endpoint + response.json()["edited_image"]# these tools should not step on each other's toes
tools = [...Tool(name = "Photo Editing",func=edit_photo,description=("A wrapper around photo editing. ""Useful to edit an image with a given instruction.""Input should be an image url, or path to an image file (e.g. .jpg, .png).")),
]

默認輸入圖像為一個表格，我們將圖像改為科比。開始運行 python sample.py 輸出，為了閱讀體驗，刪除中間的一些輸出。

arduino
復制代碼
> Entering new AgentExecutor chain...
message='Request to OpenAI API' method=post path=http://localhost:11434/v1/chat/completions
...1. There is a new image in the inputAssistant, please detect objects in this image: https://microsoft-cognitive-service-mm-react.hf.space/file=/tmp/b008c4062adec3b7295dc10fc04305813b2dec9e/celebrity.png
python-BaseException
xxx
...無法連接到Microsoft...

由于無法連接Microsoft云端服務，因此沒法繼續運行下去，如果連接上了會輸出

kotlin
復制代碼
AI: 1. There is an image in the input
AI: 1. This is an image of a basketball player in a yellow jersey holding a basketball
2. There are two faces of men detected in this image.
3. Facial recognition can detect celebrity names for these faces
AI: 1. The celebrities detected are Paul Pierce and Kobe Bryant
2. They are likely the basketball players in the image
To summerize, this is an image of basketball players Paul Pierce and Kobe Bryant in a game. Paul Pierce is in a yellow jersey holding a basketball.

總結

總的來說這篇文章中對工具的使用有點過時，收獲不是很大，有點浪費時間，尤其是Prompt設計沒有啥亮點，并且代碼有點繞。要是現在使用Function Calling ，那么就是將函數描述給到LLM，然后設計ReAct的Few Shot ，外加一個For Loop串起整個流程。后面分析了HuggingGPT，它對于工具使用好多了。

如何系統的去學習大模型LLM ？

作為一名熱心腸的互聯網老兵，我意識到有很多經驗和知識值得分享給大家，也可以通過我們的能力和經驗解答大家在人工智能學習中的很多困惑，所以在工作繁忙的情況下還是堅持各種整理和分享。

但苦于知識傳播途徑有限，很多互聯網行業朋友無法獲得正確的資料得到學習提升，故此將并將重要的 AI大模型資料 包括AI大模型入門學習思維導圖、精品AI大模型學習書籍手冊、視頻教程、實戰學習等錄播視頻免費分享出來。

😝有需要的小伙伴，可以V掃描下方二維碼免費領取🆓

在這里插入圖片描述

一、全套AGI大模型學習路線

AI大模型時代的學習之旅：從基礎到前沿，掌握人工智能的核心技能！

二、640套AI大模型報告合集

這套包含640份報告的合集，涵蓋了AI大模型的理論研究、技術實現、行業應用等多個方面。無論您是科研人員、工程師，還是對AI大模型感興趣的愛好者，這套報告合集都將為您提供寶貴的信息和啟示。

三、AI大模型經典PDF籍

隨著人工智能技術的飛速發展，AI大模型已經成為了當今科技領域的一大熱點。這些大型預訓練模型，如GPT-3、BERT、XLNet等，以其強大的語言理解和生成能力，正在改變我們對人工智能的認識。那以下這些PDF籍就是非常不錯的學習資源。

在這里插入圖片描述

四、AI大模型商業化落地方案

階段1：AI大模型時代的基礎理解

目標：了解AI大模型的基本概念、發展歷程和核心原理。
內容：
- L1.1 人工智能簡述與大模型起源
- L1.2 大模型與通用人工智能
- L1.3 GPT模型的發展歷程
- L1.4 模型工程
  - L1.4.1 知識大模型
  - L1.4.2 生產大模型
  - L1.4.3 模型工程方法論
  - L1.4.4 模型工程實踐
- L1.5 GPT應用案例

階段2：AI大模型API應用開發工程

目標：掌握AI大模型API的使用和開發，以及相關的編程技能。
內容：
- L2.1 API接口
  - L2.1.1 OpenAI API接口
  - L2.1.2 Python接口接入
  - L2.1.3 BOT工具類框架
  - L2.1.4 代碼示例
- L2.2 Prompt框架
  - L2.2.1 什么是Prompt
  - L2.2.2 Prompt框架應用現狀
  - L2.2.3 基于GPTAS的Prompt框架
  - L2.2.4 Prompt框架與Thought
  - L2.2.5 Prompt框架與提示詞
- L2.3 流水線工程
  - L2.3.1 流水線工程的概念
  - L2.3.2 流水線工程的優點
  - L2.3.3 流水線工程的應用
- L2.4 總結與展望

階段3：AI大模型應用架構實踐

目標：深入理解AI大模型的應用架構，并能夠進行私有化部署。
內容：
- L3.1 Agent模型框架
  - L3.1.1 Agent模型框架的設計理念
  - L3.1.2 Agent模型框架的核心組件
  - L3.1.3 Agent模型框架的實現細節
- L3.2 MetaGPT
  - L3.2.1 MetaGPT的基本概念
  - L3.2.2 MetaGPT的工作原理
  - L3.2.3 MetaGPT的應用場景
- L3.3 ChatGLM
  - L3.3.1 ChatGLM的特點
  - L3.3.2 ChatGLM的開發環境
  - L3.3.3 ChatGLM的使用示例
- L3.4 LLAMA
  - L3.4.1 LLAMA的特點
  - L3.4.2 LLAMA的開發環境
  - L3.4.3 LLAMA的使用示例
- L3.5 其他大模型介紹