第5章:在LangChain中如何使用AI Services

這篇文章詳細介紹了 LangChain4j 中的 AI Services 概念，展示了如何通過高層次的抽象來簡化與大語言模型（LLM）的交互。AI Services 的核心思想是隱藏底層復雜性，讓開發者專注于業務邏輯，同時支持聊天記憶、工具調用和 RAG 等高級功能。通過示例和代碼片段，文章展示了如何定義和使用 AI Services，以及如何將它們組合起來構建復雜的 LLM 驅動的應用程

AI Services | LangChain4j

引言

到目前為止，我們已經介紹了低層次的組件，例如 ChatLanguageModel、ChatMessage 和 ChatMemory 等。在這一層次上工作非常靈活，給你完全的自由，但這也迫使你編寫大量的樣板代碼（boilerplate code）。由于基于 LLM 的應用程序通常不僅需要單個組件，而是多個組件協同工作（例如，提示詞模板、聊天記憶、LLM、輸出解析器、RAG 組件：嵌入模型和存儲），并且通常涉及多次交互，因此協調它們變得更加繁瑣。

解決方案

我們希望你專注于業務邏輯，而不是底層實現細節。因此，LangChain4j 提出了兩個高層次的概念來幫助實現這一點：AI Services 和 Chains。

Chains（已廢棄）
Chains 的概念源自 Python 的 LangChain（在引入 LCEL 之前）。其想法是為每個常見用例提供一個 Chain，例如聊天機器人、RAG 等。Chains 結合了多個低層次組件，并協調它們之間的交互。然而，它們的主要問題是，如果你需要自定義某些內容，它們會顯得過于僵化。LangChain4j 目前只實現了兩個 Chains（ConversationalChain 和 ConversationalRetrievalChain），并且目前不計劃添加更多。
AI Services
我們提出了另一種解決方案，稱為 AI Services，專為 Java 設計。其想法是將與 LLM 和其他組件交互的復雜性隱藏在一個簡單的 API 后面。
這種方法類似于 Spring Data JPA 或 Retrofit：你聲明性地定義一個接口，指定所需的 API，而 LangChain4j 提供一個實現該接口的對象（代理）。你可以將 AI Service 視為應用程序服務層的一個組件，它提供 AI 服務，因此得名。

AI Services 處理最常見的操作：

為 LLM 格式化輸入。
解析 LLM 的輸出。
它們還支持更高級的功能：
聊天記憶（Chat Memory）。
工具（Tools）。
RAG（Retrieval-Augmented Generation，檢索增強生成）。

AI Services 可以用于構建支持來回交互的狀態化聊天機器人，也可以用于自動化每個 LLM 調用都是獨立的流程。

AI Service初探

最簡單的 AI Service 示例

首先，我們定義一個接口，其中包含一個名為 chat 的方法，該方法接受一個 String 類型的輸入并返回一個 String 類型的輸出：

interface Assistant {String chat(String userMessage);
}

然后，我們創建低層次組件。這些組件將在 AI Service 的底層使用。在這個例子中，我們只需要 ChatLanguageModel：

ChatLanguageModel model = OpenAiChatModel.builder().apiKey(System.getenv("OPENAI_API_KEY")).modelName(GPT_4_O_MINI).build();

最后，我們使用 AiServices 類創建 AI Service 的實例：

Assistant assistant = AiServices.create(Assistant.class, model);

注意：在 Quarkus 和 Spring Boot 應用程序中，自動配置會處理 Assistant 的創建。這意味著你不需要調用 AiServices.create(…)，只需在需要的地方注入/自動裝配 Assistant 即可。
現在我們可以使用 Assistant：

String answer = assistant.chat("Hello");
System.out.println(answer); // 輸出：Hello, how can I help you?

工作原理

你將接口的 Class 和低層次組件提供給 AiServices，AiServices 會創建一個實現該接口的代理對象。目前，它使用反射實現，但我們也在考慮其他替代方案。這個代理對象處理所有輸入和輸出的轉換。在這個例子中，輸入是一個單獨的 String，但我們使用的是接受 ChatMessage 作為輸入的 ChatLanguageModel。因此，AiService 會自動將其轉換為 UserMessage 并調用 ChatLanguageModel。由于 chat 方法的輸出類型是 String，因此在從 chat 方法返回之前，ChatLanguageModel 返回的 AiMessage 將被轉換為 String。

在 Quarkus 和 Spring Boot 應用程序中使用 AI Services

LangChain4j 提供了 Quarkus 擴展和 Spring Boot 啟動器，極大地簡化了在這些框架中使用 AI Services 的過程。

@SystemMessage

現在，我們來看一個更復雜的例子。我們將強制 LLM 使用俚語回答。這通常是通過在

SystemMessage 中提供指令來實現的：
interface Friend {@SystemMessage("You are a good friend of mine. Answer using slang.")String chat(String userMessage);
}
Friend friend = AiServices.create(Friend.class, model);
String answer = friend.chat("Hello"); // 輸出：Hey! What's up?

在這個例子中，我們添加了 @SystemMessage 注解，并指定了我們想要使用的系統提示模板。這將在后臺被轉換為 SystemMessage，并與 UserMessage 一起發送給 LLM。
@SystemMessage 也可以從資源文件中加載提示模板：

@SystemMessage(fromResource = "my-prompt-template.txt")

系統消息提供者（System Message Provider）

系統消息也可以通過系統消息提供者動態定義：

Friend friend = AiServices.builder(Friend.class).chatLanguageModel(model).systemMessageProvider(chatMemoryId -> "You are a good friend of mine. Answer using slang.").build();

你可以根據聊天記憶 ID（用戶或對話）提供不同的系統消息。

@UserMessage

假設我們使用的模型不支持系統消息，或者我們只想使用 UserMessage 來實現：

interface Friend {@UserMessage("You are a good friend of mine. Answer using slang. {{it}}")String chat(String userMessage);
}
Friend friend = AiServices.create(Friend.class, model);
String answer = friend.chat("Hello"); // 輸出：Hey! What's shakin'?

我們將 @SystemMessage 替換為 @UserMessage，并指定了一個包含變量 it 的提示模板，該變量指向方法的唯一參數。
你也可以使用 @V 注解為提示模板變量指定自定義名稱：

interface Friend {@UserMessage("You are a good friend of mine. Answer using slang. {{message}}")String chat(@V("message") String userMessage);
}

注意：在使用 LangChain4j 的 Quarkus 或 Spring Boot 應用程序中，@V 注解不是必需的。只有在 Java 編譯時未啟用 -parameters 選項時，才需要使用它。
@UserMessage 也可以從資源文件中加載提示模板：

@UserMessage(fromResource = "my-prompt-template.txt")

有效的 AI Service 方法示例

以下是一些有效的 AI Service 方法示例：

使用 UserMessage

String chat(String userMessage);String chat(@UserMessage String userMessage);String chat(@UserMessage String userMessage, @V("country") String country); // userMessage 包含 "{{country}}" 模板變量@UserMessage("What is the capital of Germany?")
String chat();@UserMessage("What is the capital of {{it}}?")
String chat(String country);@UserMessage("What is the capital of {{country}}?")
String chat(@V("country") String country);@UserMessage("What is the {{something}} of {{country}}?")
String chat(@V("something") String something, @V("country") String country);@UserMessage("What is the capital of {{country}}?")
String chat(String country); // 僅在 Quarkus 和 Spring Boot 應用程序中有效

結合 SystemMessage 和 UserMessage

@SystemMessage("Given a name of a country, answer with a name of its capital")
String chat(String userMessage);@SystemMessage("Given a name of a country, answer with a name of its capital")
String chat(@UserMessage String userMessage);@SystemMessage("Given a name of a country, {{answerInstructions}}")
String chat(@V("answerInstructions") String answerInstructions, @UserMessage String userMessage);@SystemMessage("Given a name of a country, answer with a name of its capital")
String chat(@UserMessage String userMessage, @V("country") String country); // userMessage 包含 "{{country}}" 模板變量@SystemMessage("Given a name of a country, answer with a name of its capital")
@UserMessage("Germany")
String chat();@SystemMessage("Given a name of a country, {{answerInstructions}}")
@UserMessage("Germany")
String chat(@V("answerInstructions") String answerInstructions);@SystemMessage("Given a name of a country, answer with a name of its capital")
@UserMessage("Germany")
String chat();@SystemMessage("Given a name of a country, {{answerInstructions}}")
@UserMessage("Germany")
String chat(@V("answerInstructions") String answerInstructions);@SystemMessage("Given a name of a country, answer with a name of its capital")
@UserMessage("{{country}}")
String chat(@V("country") String country);@SystemMessage("Given a name of a country, {{answerInstructions}}")
@UserMessage("{{country}}")
String chat(@V("answerInstructions") String answerInstructions, @V("country") String country);

多模態（Multimodality）

目前，AI Services 不支持多模態功能，需要使用基礎的套件和 API 實現。

結構化輸出（Structured Outputs）

如果你希望從 LLM 中獲取結構化輸出，可以將 AI Service 方法的返回類型從 String 改為其他類型。目前，AI Services 支持以下返回類型：

String
AiMessage
任意自定義 POJO（Plain Old Java Object）
任意 Enum 或 List 或 Set（用于對文本進行分類，例如情感分析、用戶意圖等）
boolean/Boolean（用于獲取“是”或“否”的回答）
byte/short/int/BigInteger/long/float/double/BigDecimal
Date/LocalDate/LocalTime/LocalDateTime
List/Set（用于以項目符號列表的形式返回答案）
Map<K, V>
Result（如果需要訪問 TokenUsage、FinishReason、來源（RAG 中檢索到的內容）和執行的工具，除了 T，T 可以是上述任意類型。例如：Result、Result）

除非返回類型是 String、AiMessage 或 Map<K, V>，AI Service 會自動在 UserMessage 的末尾附加指示 LLM 應該如何響應的指令。在方法返回之前，AI Service 會將 LLM 的輸出解析為所需的類型。
你可以通過啟用日志記錄來觀察附加的指令。

注意：某些 LLM 提供商（例如 OpenAI 和 Google Gemini）允許為期望的輸出指定 JSON 模式。如果此功能被支持且啟用，自由格式的文本指令不會被附加到 UserMessage 的末尾。在這種情況下，將從你的 POJO 自動生成 JSON 模式并傳遞給 LLM，從而確保 LLM 遵循該 JSON 模式。

現在，讓我們來看一些示例。

1. 返回類型為 boolean 的示例

interface SentimentAnalyzer {@UserMessage("Does {{it}} have a positive sentiment?")boolean isPositive(String text);
}SentimentAnalyzer sentimentAnalyzer = AiServices.create(SentimentAnalyzer.class, model);boolean positive = sentimentAnalyzer.isPositive("It's wonderful!");
// 輸出：true

2. 返回類型為 Enum 的示例

enum Priority {@Description("Critical issues such as payment gateway failures or security breaches.")CRITICAL,@Description("High-priority issues like major feature malfunctions or widespread outages.")HIGH,@Description("Low-priority issues such as minor bugs or cosmetic problems.")LOW
}interface PriorityAnalyzer {@UserMessage("Analyze the priority of the following issue: {{it}}")Priority analyzePriority(String issueDescription);
}PriorityAnalyzer priorityAnalyzer = AiServices.create(PriorityAnalyzer.class, model);Priority priority = priorityAnalyzer.analyzePriority("The main payment gateway is down, and customers cannot process transactions.");
// 輸出：CRITICAL

注意：@Description 注解是可選的。當枚舉名稱不夠直觀時，建議使用它來幫助 LLM 更好地理解。

3. 返回類型為 POJO 的示例

class Person {@Description("first name of a person") // 可選描述，幫助 LLM 更好地理解String firstName;String lastName;LocalDate birthDate;Address address;
}@Description("an address") // 可選描述，幫助 LLM 更好地理解
class Address {String street;Integer streetNumber;String city;
}interface PersonExtractor {@UserMessage("Extract information about a person from {{it}}")Person extractPersonFrom(String text);
}PersonExtractor personExtractor = AiServices.create(PersonExtractor.class, model);String text = """In 1968, amidst the fading echoes of Independence Day,a child named John arrived under the calm evening sky.This newborn, bearing the surname Doe, marked the start of a new journey.He was welcomed into the world at 345 Whispering Pines Avenuea quaint street nestled in the heart of Springfieldan abode that echoed with the gentle hum of suburban dreams and aspirations.""";Person person = personExtractor.extractPersonFrom(text);System.out.println(person); 
// 輸出：Person { firstName = "John", lastName = "Doe", birthDate = 1968-07-04, address = Address { ... } }

JSON 模式（JSON Mode）

當提取自定義 POJO（實際上是 JSON，然后解析為 POJO）時，建議在模型配置中啟用“JSON 模式”。這樣，LLM 將被強制以有效的 JSON 格式響應。

注意：JSON 模式和工具/函數調用是類似的功能，但它們有不同的 API，并且用于不同的目的。

JSON 模式：當你總是需要 LLM 以結構化格式（有效的 JSON）響應時非常有用。此外，通常不要狀態/記憶，因此每次與 LLM 的交互都是獨立的。例如，你可能希望從文本中提取信息，例如文本中提到的人的列表，或者將自由格式的產品評論轉換為具有字段（如 String productName、Sentiment sentiment、List claimedProblems 等）的結構化形式。
工具/函數調用：當 LLM 應該能夠執行某些操作時（例如查詢數據庫、搜索網絡、取消用戶的預訂等），此功能非常有用。在這種情況下，向 LLM 提供一組工具及其期望的 JSON 模式，LLM 將自主決定是否調用其中的任何一個來滿足用戶請求。
以前，函數調用常用于結構化數據提取，但現在我們有了 JSON 模式功能，它更適合此目的。

以下是啟用 JSON 模式的方法：

OpenAI

對于支持結構化輸出的較新模型（例如 gpt-4o-mini、gpt-4o-2024-08-06）

OpenAiChatModel.builder()....responseFormat("json_schema").strictJsonSchema(true).build();

對于較舊的模型（例如 gpt-3.5-turbo、gpt-4）：

OpenAiChatModel.builder()....responseFormat("json_object").build();

Azure OpenAI

AzureOpenAiChatModel.builder()....responseFormat(new ChatCompletionsJsonResponseFormat()).build();

Vertex AI Gemini

VertexAiGeminiChatModel.builder()....responseMimeType("application/json").build();

或者通過指定一個 Java 類的顯式模式：

GoogleAiGeminiChatModel.builder()....responseFormat(ResponseFormat.builder().type(JSON).jsonSchema(JsonSchemas.jsonSchemaFrom(Person.class).get()).build()).build();

或者通過指定一個 JSON 模式：

GoogleAiGeminiChatModel.builder()....responseFormat(ResponseFormat.builder().type(JSON).jsonSchema(JsonSchema.builder()...build()).build()).build();

Mistral AI

MistralAiChatModel.builder()....responseFormat(MistralAiResponseFormatType.JSON_OBJECT).build();

Ollama

OllamaChatModel.builder()....responseFormat(JSON).build();

其他模型提供商

如果底層模型提供商不支持 JSON 模式，提示工程（Prompt Engineering）是你的最佳選擇。此外，嘗試降低 temperature 參數以獲得更確定性的結果。

流式響應（Streaming）

AI Service 可以通過使用 TokenStream 返回類型逐個流式傳輸響應令牌：

interface Assistant {TokenStream chat(String message);
}StreamingChatLanguageModel model = OpenAiStreamingChatModel.builder().apiKey(System.getenv("OPENAI_API_KEY")).modelName(GPT_4_O_MINI).build();Assistant assistant = AiServices.create(Assistant.class, model);TokenStream tokenStream = assistant.chat("Tell me a joke");tokenStream.onNext((String token) -> System.out.println(token)).onRetrieved((List<Content> contents) -> System.out.println(contents)).onToolExecuted((ToolExecution toolExecution) -> System.out.println(toolExecution)).onComplete((Response<AiMessage> response) -> System.out.println(response)).onError((Throwable error) -> error.printStackTrace()).start();

使用 Flux

你也可以使用 Flux 替代 TokenStream。為此，請導入 langchain4j-reactor 模塊：

<dependency><groupId>dev.langchain4j</groupId><artifactId>langchain4j-reactor</artifactId><version>1.0.0-beta1</version>
</dependency>

代碼示例

interface Assistant {Flux<String> chat(String message);
}

聊天記憶（Chat Memory）

AI Service 可以使用聊天記憶來“記住”之前的交互：

Assistant assistant = AiServices.builder(Assistant.class).chatLanguageModel(model).chatMemory(MessageWindowChatMemory.withMaxMessages(10)).build();

在這種情況下，相同的 ChatMemory 實例將用于所有 AI Service 的調用。然而，這種方法在有多個用戶時將無法工作，因為每個用戶都需要自己的 ChatMemory 實例來維護各自的對話。
解決這個問題的方法是使用 ChatMemoryProvider：

interface Assistant {String chat(@MemoryId int memoryId, @UserMessage String message);
}Assistant assistant = AiServices.builder(Assistant.class).chatLanguageModel(model).chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(10)).build();String answerToKlaus = assistant.chat(1, "Hello, my name is Klaus");
String answerToFrancine = assistant.chat(2, "Hello, my name is Francine");

在這種情況下，ChatMemoryProvider 將為每個內存 ID 提供兩個不同的 ChatMemory 實例。

注意：

如果 AI Service 方法沒有帶有 @MemoryId 注解的參數，則 ChatMemoryProvider 中的 memoryId 默認值為字符串 “default”。
目前，AI Service 不支持對同一個 @MemoryId 的并發調用，因為這可能導致 ChatMemory 被破壞。AI Service 目前沒有實現任何機制來防止對同一個 @MemoryId 的并發調用。

工具（Tools）

AI Service 可以配置工具，LLM 可以使用這些工具：

class Tools {@Toolint add(int a, int b) {return a + b;}@Toolint multiply(int a, int b) {return a * b;}
}Assistant assistant = AiServices.builder(Assistant.class).chatLanguageModel(model).tools(new Tools()).build();String answer = assistant.chat("What is 1+2 and 3*4?");

在這種情況下，LLM 將請求執行 add(1, 2) 和 multiply(3, 4) 方法，然后才提供最終答案。LangChain4j 將自動執行這些方法。

關于工具的更多信息可以參考 LangChain4j 文檔。

RAG（檢索增強生成）

AI Service 可以配置 ContentRetriever 來啟用簡單的 RAG：

EmbeddingStore embeddingStore = ...;
EmbeddingModel embeddingModel = ...;ContentRetriever contentRetriever = new EmbeddingStoreContentRetriever(embeddingStore, embeddingModel);Assistant assistant = AiServices.builder(Assistant.class).chatLanguageModel(model).contentRetriever(contentRetriever).build();

配置 RetrievalAugmentor 可以提供更大的靈活性，啟用高級的 RAG 功能，例如查詢轉換、重新排序等：

RetrievalAugmentor retrievalAugmentor = DefaultRetrievalAugmentor.builder().queryTransformer(...).queryRouter(...).contentAggregator(...).contentInjector(...).executor(...).build();Assistant assistant = AiServices.builder(Assistant.class).chatLanguageModel(model).retrievalAugmentor(retrievalAugmentor).build();

關于 RAG 的更多信息可以參考 LangChain4j 文檔。

自動審核（Auto-Moderation）

（示例略）

鏈接多個 AI Services

隨著你的 LLM 驅動應用程序的邏輯變得越來越復雜，將其分解為更小的部分變得至關重要，這在軟件開發中是一種常見的實踐。

例如，將大量指令塞入系統提示中以涵蓋所有可能的場景，容易出錯且效率低下。如果指令過多，LLM 可能會忽略一些。此外，指令的呈現順序也很重要，這使得整個過程更加復雜。

這一原則也適用于工具、RAG 和模型參數（例如 temperature、maxTokens 等）。
你的聊天機器人可能并不需要在所有情況下都了解你所有的工具。例如，當用戶僅僅是問候聊天機器人或說再見時，讓 LLM 訪問數十個甚至數百個工具（每個工具都會消耗大量的 token）是成本高昂的，有時甚至是危險的，可能會導致意外的結果（LLM 可能會幻覺或被操縱以調用工具并輸入意外的內容）。

關于 RAG：同樣，有時需要為 LLM 提供一些上下文，但并非總是如此，因為這會增加額外的成本（更多上下文 = 更多 token）并增加響應時間（更多上下文 = 更高的延遲）。

關于模型參數：在某些情況下，你可能需要 LLM 高度確定性，因此你會設置較低的 temperature。在其他情況下，你可能會選擇較高的 temperature，依此類推。

要點是，更小且更具體的組件更容易、更便宜開發、測試、維護和理解。

另一個需要考慮的方面是兩個極端：

你是否希望你的應用程序高度確定性，其中應用程序控制流程，LLM 只是其中一個組件？
或者你希望 LLM 完全自主并驅動你的應用程序？

或許根據情況，兩者都有？所有這些選項都可以通過將你的應用程序分解為更小、更易于管理的部分來實現。

AI Services 可以作為常規（確定性）軟件組件使用，并與其他組件結合：

你可以依次調用一個 AI Service（即鏈式調用）。
你可以使用確定性和基于 LLM 的 if/else 語句（AI Services 可以返回 boolean）。
你可以使用確定性和基于 LLM 的 switch 語句（AI Services 可以返回 enum）。
你可以使用確定性和基于 LLM 的 for/while 循環（AI Services 可以返回 int 和其他數值類型）。
你可以模擬 AI Service（因為它是一個接口）以進行單元測試。
你可以單獨集成測試每個 AI Service。
你可以分別評估每個 AI Service 并找到每個子任務的最優參數。
等等。
讓我們考慮一個簡單的例子。我想為我的公司構建一個聊天機器人。如果用戶問候聊天機器人，我希望它用預定義的問候語回答，而不依賴 LLM 生成問候語。如果用戶提問，我希望 LLM 使用公司的內部知識庫生成回答（即 RAG）。
以下是如何將此任務分解為兩個獨立的 AI Services：

interface GreetingExpert {@UserMessage("Is the following text a greeting? Text: {{it}}")boolean isGreeting(String text);
}interface ChatBot {@SystemMessage("You are a polite chatbot of a company called Miles of Smiles.")String reply(String userMessage);
}class MilesOfSmiles {private final GreetingExpert greetingExpert;private final ChatBot chatBot;public MilesOfSmiles(GreetingExpert greetingExpert, ChatBot chatBot) {this.greetingExpert = greetingExpert;this.chatBot = chatBot;}public String handle(String userMessage) {if (greetingExpert.isGreeting(userMessage)) {return "Greetings from Miles of Smiles! How can I make your day better?";} else {return chatBot.reply(userMessage);}}
}GreetingExpert greetingExpert = AiServices.create(GreetingExpert.class, llama2);ChatBot chatBot = AiServices.builder(ChatBot.class).chatLanguageModel(gpt4).contentRetriever(milesOfSmilesContentRetriever).build();MilesOfSmiles milesOfSmiles = new MilesOfSmiles(greetingExpert, chatBot);String greeting = milesOfSmiles.handle("Hello");
System.out.println(greeting); // 輸出：Greetings from Miles of Smiles! How can I make your day better?String answer = milesOfSmiles.handle("Which services do you provide?");
System.out.println(answer); // 輸出：At Miles of Smiles, we provide a wide range of services ...

注意我們如何使用較便宜的 Llama2 來完成簡單的問候識別任務，而使用更昂貴的 GPT-4（帶有內容檢索器，即 RAG）來完成更復雜的任務。

這是一個非常簡單且有些幼稚的例子，但希望它能說明這個想法。
現在，我可以分別模擬 GreetingExpert 和 ChatBot，并獨立測試 MilesOfSmiles。此外，我還可以分別集成測試 GreetingExpert 和 ChatBot，分別評估它們，并為每個子任務找到最優化的參數，甚至在長期內為每個特定任務微調一個小的專用模型。

總結

這篇文章詳細介紹了 LangChain4j 中的 AI Services 概念，展示了如何通過高層次的抽象來簡化與大語言模型（LLM）的交互。AI Services 的核心思想是隱藏底層復雜性，讓開發者專注于業務邏輯，同時支持聊天記憶、工具調用和 RAG 等高級功能。通過示例和代碼片段，文章展示了如何定義和使用 AI Services，以及如何將它們組合起來構建復雜的 LLM 驅動的應用程序。