Spring Boot 深度集成 Ollama 指南：從聊天模型配置到生產級應用開發

前言

在人工智能應用開發中，大語言模型（LLM）的本地化部署需求日益增長。Ollama 作為開源的本地LLM運行平臺，支持Mistral、LLaMA等主流模型，并提供與OpenAI兼容的API接口，而 Spring AI 則為Java開發者提供了便捷的集成工具鏈。本文將結合Spring Boot框架，詳細講解如何通過Spring AI實現Ollama聊天模型的全生命周期管理，涵蓋基礎配置、高級功能開發及生產環境優化，幫助開發者構建高效、可控的本地化智能應用。

一、Ollama 環境搭建與 Spring Boot 集成

1. 安裝與啟動 Ollama

本地安裝：
從 Ollama 官網下載對應系統的二進制文件（如Windows、macOS或Linux），啟動后默認監聽端口 11434。
```
# 啟動命令（示例）
./ollama server
```

模型拉取：
通過命令行預拉取模型，避免運行時延遲：

ollama pull mistral       # 拉取默認聊天模型
ollama pull hf.co/llama3 # 拉取Hugging Face GGUF格式模型

2. Spring Boot 依賴配置

在 pom.xml 中引入Spring AI對Ollama的支持模塊：

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependencyManagement><dependencies><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-bom</artifactId><version>3.2.0</version> <!-- 替換為最新版本 --><type>pom</type><scope>import</scope></dependency></dependencies>
</dependencyManagement>

二、核心配置：聊天模型參數與自動管理

1. 基礎連接與模型初始化

在 application.yaml 中配置Ollama服務地址及模型拉取策略：

spring:ai:ollama:base-url: http://localhost:11434  # Ollama API地址init:pull-model-strategy: when_missing  # 自動拉取策略（always/never/when_missing）timeout: 10m                      # 拉取超時時間max-retries: 2                    # 拉取失敗重試次數chat:include: true                   # 是否初始化聊天模型additional-models: ["llama3-13b"] # 額外預拉取的模型列表

2. 聊天模型參數詳解

通過 spring.ai.ollama.chat.options 前綴配置模型行為，關鍵參數如下：

屬性	描述	示例配置
`model`	目標模型名稱（如`mistral`、`llava`）	`model: mistral`
`temperature`	生成隨機性（0.0保守，1.0創意）	`temperature: 0.7`
`num-ctx`	上下文窗口大小（影響歷史對話記憶，單位：Token）	`num-ctx: 4096`
`stop`	終止生成的字符序列（如`["###", "\nEND"]`）	`stop: ["###"]`
`keep-alive`	模型在內存中保持加載的時間（避免頻繁重新加載）	`keep-alive: 10m`
`num-gpu`	GPU加速層數（macOS設為1啟用Metal，-1自動檢測）	`num-gpu: 1`

完整配置示例：

spring:ai:ollama:chat:options:model: mistraltemperature: 0.8num-ctx: 2048stop: ["用戶:", "###"]top-k: 50

三、高級功能開發：從函數調用到多模態支持

1. 函數調用（Tool Calling）

通過Ollama的函數調用能力，實現LLM與外部工具的聯動（需Ollama ≥0.2.8）：

@RestController
public class ToolController {private final OllamaChatModel chatModel;@Autowiredpublic ToolController(OllamaChatModel chatModel) {this.chatModel = chatModel;}@PostMapping("/tool/call")public ChatResponse toolCall(@RequestBody String prompt) {// 注冊可調用的函數列表List<String> functions = Arrays.asList("searchWeather", "calculate");return chatModel.call(new Prompt(prompt, OllamaOptions.builder().functions(functions).build()));}
}

LLM將返回包含函數名和參數的JSON，例如：

{"function_call": {"name": "searchWeather","parameters": { "city": "北京", "date": "2023-10-01" }}
}

2. 多模態支持（文本+圖像）

利用LLaVA等多模態模型處理圖像輸入（需Ollama支持多模態模型）：

@GetMapping("/multimodal")
public ChatResponse multimodalQuery() throws IOException {// 加載圖像資源ClassPathResource imageResource = new ClassPathResource("cat.jpg");Media imageMedia = new Media(MimeTypeUtils.IMAGE_JPEG, imageResource);// 構造包含圖像的用戶消息UserMessage userMessage = new UserMessage("描述圖片中的動物", imageMedia);return chatModel.call(new Prompt(userMessage, OllamaOptions.builder().model("llava").build()));
}

多模態是指模型同時理解和處理來自各種來源的信息（包括文本、圖像、音頻和其他數據格式）的能力。

Ollama 中支持多模態的一些模型是 LLaVA 和 BakLLaVA（請參閱完整列表）。有關更多詳細信息，請參閱 LLaVA：大型語言和視覺助手。

Ollama 消息 API 提供了一個 “images” 參數，用于將 base64 編碼的圖像列表與消息合并。

Spring AI 的 Message 接口通過引入 Media 類型來促進多模態 AI 模型。此類型包含有關消息中媒體附件的數據和詳細信息，使用 Spring 的org.springframework.util.MimeType以及org.springframework.core.io.Resource對于原始媒體數據。

下面是一個摘自 OllamaChatModelMultimodalIT.java 的簡單代碼示例，說明了用戶文本與圖像的融合。

var imageResource = new ClassPathResource("/multimodal.test.png");var userMessage = new UserMessage("Explain what do you see on this picture?",new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource));ChatResponse response = chatModel.call(new Prompt(this.userMessage,OllamaOptions.builder().model(OllamaModel.LLAVA)).build());

該示例顯示了一個模型，將multimodal.test.png圖像：
多模態測試圖像
以及文本消息 “Explain what do you see on this picture？”，并生成如下響應：

The image shows a small metal basket filled with ripe bananas and red apples. The basket is placed on a surface,
which appears to be a table or countertop, as there's a hint of what seems like a kitchen cabinet or drawer in
the background. There's also a gold-colored ring visible behind the basket, which could indicate that this
photo was taken in an area with metallic decorations or fixtures. The overall setting suggests a home environment
where fruits are being displayed, possibly for convenience or aesthetic purposes.

3. 結構化輸出與JSON Schema

強制模型返回符合指定格式的結構化數據，便于后續解析：

// 定義JSON Schema
String schema = """{"type": "object","properties": {"steps": {"type": "array","items": { "type": "string" }},"result": { "type": "number" }},"required": ["steps", "result"]}
""";// 在請求中指定格式
ChatResponse response = chatModel.call(new Prompt("計算1+2+3的步驟", OllamaOptions.builder().format(new ObjectMapper().readValue(schema, Map.class)).build()
));

四、生產環境優化與最佳實踐

1. 模型管理策略

禁用自動拉取：生產環境中通過 pull-model-strategy: never 關閉自動拉取，提前通過 ollama pull 預下載模型。
模型版本控制：固定模型版本（如mistral:latest），避免因模型更新導致的行為變化。

2. 資源性能調優

GPU加速：設置 num-gpu: -1 自動檢測GPU，或根據硬件指定層數（如num-gpu: 8）。
內存優化：啟用 low-vram: true 減少顯存占用，或 use-mlock: true 鎖定模型內存防止交換。

3. 連接與容錯

連接池配置：通過 OllamaApi 配置連接超時（如connectTimeout(5000)）和重試機制。
監控與日志：集成Micrometer監控Ollama請求延遲、錯誤率，或通過SLF4J記錄模型調用日志。

五、OpenAI API 兼容模式：無縫遷移現有應用

Ollama提供與OpenAI API兼容的端點，允許直接復用Spring AI的OpenAI客戶端：

spring:ai:openai:chat:base-url: http://localhost:11434  # 指向Ollama服務options:model: mistral                  # 使用Ollama模型temperature: 0.9

此模式下，原有基于OpenAI的代碼（如函數調用、流式響應）無需修改即可運行，示例：

@Autowired
private OpenAIAsyncChatClient openAiClient;public String openAICompatQuery() {return openAiClient.chat(Collections.singletonList(new ChatMessage(ChatMessageRole.USER, "推薦一本技術書籍"))).block();
}

在這里插入圖片描述

六、手動配置與低級API使用

若需自定義Bean或繞過自動配置，可手動構建OllamaChatModel：

@Configuration
public class OllamaConfig {@Beanpublic OllamaChatModel ollamaChatModel() {OllamaApi ollamaApi = OllamaApi.builder().baseUrl("http://remote-ollama-server:11434").build();return OllamaChatModel.builder().ollamaApi(ollamaApi).defaultOptions(OllamaOptions.builder().model("mistral").temperature(0.6).build()).build();}
}