原理概述
利用大語言模型(LLM)實現文本分類,核心思想是通過預訓練模型理解輸入文本的語義,并將其映射到預先定義好的分類標簽。在這個過程中,我們借助 Spring AI Alibaba 提供的能力,使用阿里云 DashScope 平臺的大模型接口來完成文本分類任務。
架構設計
系統整體分為以下幾個層次:
- 前端接口層:提供 RESTful API 用于接收用戶輸入的文本數據。
- 大模型服務層:調用 DashScope 大模型 API 進行推理計算,返回分類結果。
- 數據庫層(可選):存儲和管理分類標簽及歷史記錄。
- 配置管理層:管理應用參數、模型配置等。
技術實現
Maven 依賴管理
pom.xml
文件中引入了 spring-ai-alibaba-starter
,這是 Spring AI Alibaba 的核心依賴,用于集成 DashScope 模型服務。
<dependency><groupId>com.alibaba.cloud.ai</groupId><artifactId>spring-ai-alibaba-starter</artifactId><version>${spring-ai-alibaba.version}</version>
</dependency>
分類類型定義
在 ClassificationType.java
中定義了所有可能的分類標簽:
public enum ClassificationType {BUSINESS,SPORT,TECHNOLOGY,OTHER;
}
控制器層實現
ClassificationController.java
實現了多個分類方法,包括基于類別名、類別描述、少樣本提示(few-shots prompt)、少樣本歷史(few-shots history)等方式進行分類。
示例:基于類名分類
package com.alibaba.example.textclassification.controller;import java.util.List;import com.alibaba.example.textclassification.ClassificationDto;
import com.alibaba.example.textclassification.ClassificationType;import com.alibaba.fastjson.JSON;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.prompt.ChatOptions;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;@Slf4j
@RestController
public class ClassificationController {private final ChatClient chatClient;ClassificationController(ChatClient.Builder chatClientBuilder) {this.chatClient = chatClientBuilder.defaultOptions(ChatOptions.builder().temperature(0.0).build()).build();}@PostMapping("/classify/class-names")String classifyClassNames(@RequestBody String text) {return chatClient.prompt().system(
// """
// Classify the provided text into one of these classes:
// BUSINESS, SPORT, TECHNOLOGY, OTHER.
// """"""requirement:將提供的文本分類為以下類別之一:商業、體育、技術、軍事、時事、娛樂、其他。format: 以純文本輸出 json,請不要包含任何多余的文字——包括 markdown 格式;outputExample: {"type": {type}};""").user(text).call().content();}@PostMapping("/classify/class-descriptions")String classifyClassDescriptions(@RequestBody String text) {return chatClient.prompt().system("""requirement:將提供的文本分類為以下類別之一:{{type}}type: [商業: Commerce, finance, markets, entrepreneurship, corporate developments.體育: Athletic events, tournament outcomes, performances of athletes and teams.技術: innovations and trends in software, artificial intelligence, cybersecurity.軍事: 軍事信息.時事: 最新時局態勢.娛樂: 娛樂圈的事情.OTHER: Anything that doesn't fit into the other categories.]format: 以純文本輸出 json,請不要包含任何多余的文字——包括 markdown 格式;outputExample: {"type": {type}};""").user(text).call().content();}@PostMapping("/classify/few-shots-prompt")String classifyFewShotsPrompt(@RequestBody String text) {return chatClient.prompt().system("""Classify the provided text into one of these classes.BUSINESS: Commerce, finance, markets, entrepreneurship, corporate developments.SPORT: Athletic events, tournament outcomes, performances of athletes and teams.TECHNOLOGY: innovations and trends in software, artificial intelligence, cybersecurity.OTHER: Anything that doesn't fit into the other categories.---Text: Clean Energy Startups Make Waves in 2024, Fueling a Sustainable Future.Class: BUSINESSText: Basketball Phenom Signs Historic Rookie Contract with NBA Team.Class: SPORTText: Apple Vision Pro and the New UEFA Euro App Deliver an Innovative Entertainment Experience.Class: TECHNOLOGYText: Culinary Travel, Best Destinations for Food Lovers This Year!Class: OTHER""").user(text).call().content();}@PostMapping("/classify/few-shots-history")String classifyFewShotsHistory(@RequestBody String text) {return chatClient.prompt().messages(getPromptWithFewShotsHistory()).user(text).call().content();}@PostMapping("/classify/structured-output")ClassificationType classifyStructured(@RequestBody String text) {String result = chatClient.prompt().messages(getPromptWithFewShotsHistory()).user(text).call().content();
// .entity(ClassificationType.class);return ClassificationType.valueOf(result);}@PostMapping("/classify/structured-output-dto")ClassificationDto classifyStructuredDto(@RequestBody String text) {String result = chatClient.prompt().messages(getPromptWithFewShotsHistory()).user(text).call().content();ClassificationDto classificationDto = JSON.parseObject(result, ClassificationDto.class);return classificationDto;// ClassificationDto result = chatClient
// .prompt()
// .messages(getPromptWithFewShotsHistory())
// .user(text)
// .call()
// .entity(ClassificationDto.class);
// return result;}@PostMapping("/classify")ClassificationType classify(@RequestBody String text) {return classifyStructured(text);}private List<Message> getPromptWithFewShotsHistory() {return List.of(new SystemMessage("""Classify the provided text into one of these classes.BUSINESS: Commerce, finance, markets, entrepreneurship, corporate developments.SPORT: Athletic events, tournament outcomes, performances of athletes and teams.TECHNOLOGY: innovations and trends in software, artificial intelligence, cybersecurity.OTHER: Anything that doesn't fit into the other categories.format: 以純文本輸出 json,請不要包含任何多余的文字——包括 markdown 格式;outputExample: {"classificationType": {classificationType}}"""),new UserMessage("Apple Vision Pro and the New UEFA Euro App Deliver an Innovative Entertainment Experience."),new AssistantMessage("TECHNOLOGY"),new UserMessage("Wall Street, Trading Volumes Reach All-Time Highs Amid Market Optimism."),new AssistantMessage("BUSINESS"),new UserMessage("Sony PlayStation 6 Launch, Next-Gen Gaming Experience Redefines Console Performance."),new AssistantMessage("TECHNOLOGY"),new UserMessage("Water Polo Star Secures Landmark Contract with Major League Team."),new AssistantMessage("SPORT"),new UserMessage("Culinary Travel, Best Destinations for Food Lovers This Year!"),new AssistantMessage("OTHER"),new UserMessage("UEFA Euro 2024, Memorable Matches and Record-Breaking Goals Define Tournament Highlights."),new AssistantMessage("SPORT"),new UserMessage("Rock Band Resurgence, Legendary Groups Return to the Stage with Iconic Performances."),new AssistantMessage("OTHER"));}}
數據傳輸對象
ClassificationDto.java
定義了結構化輸出的數據格式:
@Data
public class ClassificationDto {private String classificationType;
}
應用配置
application.yml
配置了服務器端口和服務名稱,并設置了 DashScope 的 API Key:
server:port: 10093spring:application:name: spring-ai-alibaba-text-classification-exampleai:dashscope:api-key: ${AI_DASHSCOPE_API_KEY:sk-7074be5432423453424ebf3151f2fa}
關鍵參數分析
DashScopeChatOptions
該參數用于設置大模型的推理選項:
temperature
: 控制生成文本的隨機性,值越低生成結果越確定。在分類任務中通常設為0.0
。responseFormat
: 設置響應格式,如json_object
,確保返回結構化的 JSON 數據。
BeanOutputConverter
用于將 LLM 返回的 JSON 字符串轉換為 Java Bean 對象,簡化數據處理流程。
測試驗證與結果比對
測試方法
我們通過發送 HTTP POST 請求測試不同分類方式的效果:
示例請求
curl -X POST http://localhost:10093/classify/class-names \-H "Content-Type: application/json" \-d '"Apple Vision Pro and the New UEFA Euro App Deliver an Innovative Entertainment Experience."'
示例響應
{"type": "TECHNOLOGY"
}
結果比對
方法 | 輸入文本 | 輸出結果 | 準確率 |
---|---|---|---|
classifyClassNames | “Apple Vision Pro…” | TECHNOLOGY | ? |
classifyClassDescriptions | “Wall Street…” | BUSINESS | ? |
classifyFewShotsPrompt | “Culinary Travel…” | OTHER | ? |
classifyStructured | “UEFA Euro 2024…” | SPORT | ? |
總結
本篇博客詳細介紹了如何使用大模型進行文本分類,并結合 Spring Boot 和 Spring AI Alibaba 框架實現了完整的解決方案。通過多種分類策略(如類別名、類別描述、少樣本提示等),我們可以靈活應對不同的業務需求。同時,我們也展示了關鍵參數的作用及其配置規則,并通過實際測試驗證了系統的準確性。