SpringBoot+本地部署大模型實現RAG知識庫功能
- 1、Linux系統部署本地大模型
- 1.1 安裝ollama
- 1.2 啟動ollama
- 1.3 下載deepseek模型
- 2、Springboot代碼調用本地模型實現基礎問答功能
- 3、集成向量數據庫
- 4、知識庫數據喂取
- 5、最終實現RAG知識庫功能
1、Linux系統部署本地大模型
1.1 安裝ollama
# wget https://ollama.com/download/ollama-linux-amd64.tgz
# tar -C /usr/local -zxvf ollama-linux-amd64.tgz
1.2 啟動ollama
# ollama serve
// 這里注意如果要允許其他客戶端遠程調用本模型的話需要執行以下啟動命令
OLLAMA_HOST=0.0.0.0 OLLAMA_ORIGINS=* ollama serve
或者
OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_ORIGINS=* ollama serve > ollama.log 2>&1
這里我啟動的時候遇到了報錯(主要問題是服務器上的libstdc++ 版本低)
ollama: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.25‘ not found (required by ollama)
解決辦法如下:(鏈接放這里)
https://blog.csdn.net/u011250186/article/details/147144845
這里需要注意的是這里的第三步驟配置并編譯的時間會比較長
1.3 下載deepseek模型
ollama run deepseek-r1:1.5b
可以直接通過服務器去先模型發起提問:
2、Springboot代碼調用本地模型實現基礎問答功能
這里前端我主要做的是知識庫基本問答功能的一個界面可以調用后臺接口然后后臺接口根據http請求去調用本地模型的問題大接口生成回復。
@GetMapping(value = "/getArtificialIntelligence")public ResponseEntity<String> getFaultsByTaskId(@RequestParam(name = "message") String message) throws PromptException {return ResponseEntity.ok(aiService.getArtificialIntelligence(message));}
@Value("${ollama.url}")private String OLLAMA_API_URL;@Value("${ollama.model}")private String OLLAMA_MODEL;@Overridepublic String getArtificialIntelligence(String message) throws PromptException {try {// 1. 構建請求體OllamaRequest request = new OllamaRequest(OLLAMA_MODEL, message,true);// 2. 發送請求HttpHeaders headers = new HttpHeaders();headers.setContentType(MediaType.APPLICATION_JSON);HttpEntity<OllamaRequest> entity = new HttpEntity<>(request, headers);ResponseEntity<OllamaResponse> response = restTemplate.exchange(OLLAMA_API_URL,HttpMethod.POST,entity,OllamaResponse.class);System.out.println(response);System.out.println(response.getBody().getResponse());// 3. 解析響應// generate接口if (response.getStatusCode().is2xxSuccessful() && response.hasBody()) {return response.getBody().getResponse();}throw new PromptException("API響應異常:" + response.getStatusCode());} catch (HttpStatusCodeException e) {// 精準處理HTTP狀態碼switch (e.getStatusCode().value()) {case 400:throw new PromptException("請求參數錯誤:" + e.getResponseBodyAsString());case 404:throw new PromptException("模型未找到,請檢查配置");case 500:throw new PromptException("模型服務內部錯誤");default:throw new PromptException("API請求失敗:" + e.getStatusCode());}} catch (Exception e) {throw new PromptException("系統異常:" + e.getMessage());}}
3、集成向量數據庫
這里我選擇使用的是PostgreSQL的vector向量拓展
官網地址:
https://pgxn.org/dist/vector/0.7.4/README.html#Windows
首先必須確保已安裝Visual Studio 中的 C++ 支持,然后運行:
& "D:\SoftWare\visual Studio\VC\Auxiliary\Build\vcvars64.bat"cd D:\SoftWare\pgvector\pgvector-master
set "PGROOT=D:\SoftWare\PgSQL"
nmake /F Makefile.win
nmake /F Makefile.win install# 設置 Visual Studio 環境變量
& "D:\SoftWare\Visual Studio\VC\Auxiliary\Build\vcvars64.bat"# 進入源碼目錄
cd D:\SoftWare\pgvector\pgvector-master# 執行編譯
& "D:\SoftWare\Visual Studio\VC\Tools\MSVC\14.43.34808\bin\Hostx64\x64\nmake.exe" /F Makefile.win PGROOT="D:\SoftWare\PgSQL" PG_CONFIG="D:\SoftWare\PgSQL\bin\pg_config.exe"
然后就可以根據pgsql的拓展給的提示去建數據庫和表了
4、知識庫數據喂取
我是通過前端界面實現一個文件上傳功能然后將文件解析成向量然后存入向量數據庫。
這里使用到了另外一個deepseek模型 nomic-embed-text
需要通過此模型將文件中的數據轉成向量然后存儲到數據庫中。
后端代碼如下:
@PostMapping("/upload")public ResponseEntity<String> uploadFile(@RequestParam("multipartFiles") MultipartFile file) {try {documentService.processUploadedFile(file);return ResponseEntity.ok("文件已成功解析并存入數據庫");} catch (Exception e) {return ResponseEntity.status(500).body("文件處理失敗:" + e.getMessage());}}
private final Tika tika = new Tika();@Resourceprivate EmbeddingService embeddingService;@Resourceprivate DocumentChunkRepository documentChunkRepository;/*** 處理上傳的文件* @param file*/@Overridepublic void processUploadedFile(MultipartFile file) throws IOException, TikaException {String fileName = file.getOriginalFilename();if (fileName == null || fileName.isEmpty()) throw new IllegalArgumentException("文件名為空");// 使用Tika解析文件為文本String textContent = tika.parseToString(file.getInputStream());// 對文本進行分塊 - 改進分塊策略
// List<String> chunks = splitTextIntoChunks(textContent, 512);List<String> chunks = splitTextIntoChunks(textContent, 512, 100);// 批量保存DocumentChunk對象List<DocumentChunk> documentChunks = new ArrayList<>();for (String chunk : chunks) {try {float[] embedding = embeddingService.getEmbedding(chunk);DocumentChunk chunkEntity = new DocumentChunk();chunkEntity.setFilename(fileName);chunkEntity.setContent(chunk);chunkEntity.setEmbedding(embedding);documentChunks.add(chunkEntity);} catch (Exception e) {logger.error("處理文本塊時出錯: {}", e.getMessage(), e);}}// 批量保存到數據庫if (!documentChunks.isEmpty()) {documentChunkRepository.saveAll(documentChunks);}}/*** 將文本分割成適當大小的塊,同時嘗試保留句子或段落的完整性。** @param text 文本內容* @param maxChunkSize 每個塊的最大字符數* @return 分割后的文本塊列表*/private List<String> splitTextIntoChunks(String text, int maxChunkSize, int overlap) {List<String> chunks = new ArrayList<>();StringBuilder currentChunk = new StringBuilder(maxChunkSize);String[] sentences = text.split("。|?|!|\\n"); // 按句號/換行切分句子for (String sentence : sentences) {if (sentence.trim().isEmpty()) continue;if (currentChunk.length() + sentence.length() > maxChunkSize) {chunks.add(currentChunk.toString());// 添加 overlap 部分if (overlap > 0 && !chunks.isEmpty()) {String lastPart = getLastNChars(chunks.get(chunks.size() - 1), overlap);currentChunk = new StringBuilder(lastPart).append(sentence);} else {currentChunk = new StringBuilder(sentence);}} else {currentChunk.append(sentence).append(" ");}}if (currentChunk.length() > 0) {chunks.add(currentChunk.toString());}return chunks;
}// 輔助函數:取字符串末尾 n 字符
private String getLastNChars(String str, int n) {return str.length() > n ? str.substring(str.length() - n) : str;
}
@Service
public class EmbeddingServiceImpl implements EmbeddingService {private final RestTemplate restTemplate = new RestTemplate();private final ObjectMapper objectMapper = new ObjectMapper();@Overridepublic float[] getEmbedding(String text) {String url = "http://192.168.2.45:11434/api/embeddings";ObjectMapper objectMapper = new ObjectMapper();try {Map<String, Object> requestBody = new HashMap<>();requestBody.put("model", "nomic-embed-text");requestBody.put("prompt", text);String requestBodyJson = objectMapper.writeValueAsString(requestBody);// 發送POST請求并接收響應ResponseEntity<String> responseEntity = restTemplate.postForEntity(url, requestBodyJson, String.class);if (responseEntity.getStatusCode() != HttpStatus.OK) {throw new RuntimeException("HTTP 錯誤狀態碼: " + responseEntity.getStatusCodeValue());}Map<String, Object> map = objectMapper.readValue(responseEntity.getBody(), Map.class);Object embeddingObj = map.get("embedding");if (embeddingObj instanceof float[]) {return (float[]) embeddingObj;} else if (embeddingObj instanceof List<?>) {@SuppressWarnings("unchecked")List<Double> list = (List<Double>) embeddingObj;float[] arr = new float[list.size()];for (int i = 0; i < arr.length; i++) {arr[i] = list.get(i).floatValue();}return arr;} else {throw new RuntimeException("Unexpected type for embedding: " + (embeddingObj != null ? embeddingObj.getClass().getName() : "null"));}} catch (Exception e) {throw new RuntimeException("Failed to get embedding", e);}}
5、最終實現RAG知識庫功能
最后就是當用戶發起提問的時候首先得去數據庫中檢索是否有相似的內容片段 如果有的話需要將向量匹配到的內容的原文拿到之后再給模型發起提問的時候帶上這些上下文的 Prompt然后讓模型根據這部分內容去增強生成 這既是檢索增強生成 RAG。下面是代碼示例如下:
這里調用的模型的chat接口返回的數據格式是流式回復所以我這里月使用Flux格式返給前端 讓前端生成回復的時候不需要等待太久用戶體驗更好一點。
/*** 流式回復* @param message* @return*/@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)public Flux<StreamResponse> streamResponse(@RequestParam String message) {return aiService.getStreamResponse(message).timeout(Duration.ofMinutes(10)).doFinally((SignalType signal) -> {if (signal == SignalType.CANCEL) {logger.info("客戶端中斷了流式連接");}});}
public Flux<StreamResponse> getStreamResponse(String message) {return Flux.defer(() -> {try {// 1. 檢索相關上下文List<DocumentChunk> chunks = retrievalService.retrieveRelevantChunks(message, 3);StringBuilder contextBuilder = new StringBuilder();for (DocumentChunk chunk : chunks) {contextBuilder.append(chunk.getContent()).append("\n\n");}// 2. 構建帶上下文的 PromptString prompt = String.format("請基于以下上下文回答問題:\n\n%s\n\n問題:%s", contextBuilder.toString(), message);// 3. 構建請求體Map<String, Object> requestBody = new HashMap<>();requestBody.put("model", OLLAMA_MODEL);requestBody.put("messages", Collections.singletonList(new Message("user", prompt)));requestBody.put("stream", true);requestBody.put("options", new Options());// 4. 發送流式請求return webClient.post().uri(OLLAMA_API_URL).contentType(MediaType.APPLICATION_JSON).bodyValue(requestBody).retrieve().bodyToFlux(String.class).map(this::parseChunk).doOnSubscribe(sub -> log.debug("建立連接成功")).doOnNext(response -> log.trace("收到分塊數據:{}", response)).doOnError(e -> log.error("流式處理異常:", e)).onErrorResume(e -> {log.error("流式請求失敗", e);return Flux.error(new PromptException("AI服務暫時不可用"));});} catch (Exception e) {return Flux.error(new PromptException("文檔檢索失敗: " + e.getMessage()));}})
}// 處理每部分分塊數據private StreamResponse parseChunk(String chunk) {try {JsonNode node = new ObjectMapper().readTree(chunk);StreamResponse response = new StreamResponse();response.setResponse( node.path("message").path("content").asText());response.setDone(node.path("done").asBoolean());return response;} catch (Exception e) {return new StreamResponse("解析錯誤", true);}}
@Service
public class RetrievalServiceImpl implements RetrievalService {@Resourceprivate EmbeddingService embeddingService;@Resourceprivate DocumentChunkRepository documentChunkRepository;/*** 檢索最相關的文檔片段* @param query 檢索內容* @param topK 返回最相似的數量* @return*/@Overridepublic List<DocumentChunk> retrieveRelevantChunks(String query, int topK) {// 1. 獲取嵌入向量float[] queryVector = embeddingService.getEmbedding(query);// 2. 將 float[] 轉換為 PostgreSQL 可識別的 vector 字符串格式 "[v1,v2,v3]"StringBuilder sb = new StringBuilder("[");for (int i = 0; i < queryVector.length; i++) {sb.append(queryVector[i]);if (i < queryVector.length - 1) {sb.append(",");}}sb.append("]");String vectorAsString = sb.toString();// 3. 使用字符串形式傳參,避免 Hibernate 自動轉成 byteareturn documentChunkRepository.findSimilarChunks(vectorAsString, topK);}
}
這個demo測試已經寫了很久了 有很多細節在文章的時候有點想不起來了 后期還會繼續優化補充!!!!