使用 Redis 作為向量數據庫

一、什么是向量數據庫？

向量（Vector）：在機器學習和 AI 中，向量是由一系列數字組成的序列，用于數值化地描述數據的特征或語義。文本、圖像、音頻等非結構化數據可以通過模型轉換成固定長度的向量。
向量數據庫：專門存儲、索引和檢索向量的數據庫系統。可以基于向量之間的距離度量（如余弦相似度、歐氏距離等）進行高效的近鄰搜索（Nearest Neighbor Search），從而實現“語義搜索”或“相似度搜索”。
與傳統搜索的區別：
- 傳統搜索依賴于關鍵詞精確匹配，無法識別同義詞、上下文或語義抽象。
- 向量搜索通過將數據空間映射到高維向量空間，使語義相近的內容在向量空間中距離更近，從而返回更符合用戶意圖的結果。

二、準備工作

本文示例使用 Python 客戶端庫 RedisVL，以及常見的 Python 生態組件：

# 建議在虛擬環境中安裝
pip install redis pandas sentence-transformers tabulate redisvl

說明

redis：官方 Python 客戶端。
pandas：用于結果展示。
sentence-transformers：生成文本向量。
tabulate：渲染 Markdown 表格。
redisvl：Redis 向量搜索專用擴展（可選，本文使用原生 redis.commands.search API）。

三、連接 Redis

如果你使用本地 Redis：

import redisclient = redis.Redis(host="localhost", port=6379, decode_responses=True)

如果使用 Redis Cloud，則將 host、port、password 替換為云端實例參數：

client = redis.Redis(host="redis-16379.c283.us-east-1-4.ec2.cloud.redislabs.com",port=16379,password="your_password_here",decode_responses=True,
)

四、準備示例數據集

本文使用開源的 bikes 數據集，每條記錄包含如下字段：

{"model": "Jigger","brand": "Velorim","price": 270,"type": "Kids bikes","specs": {"material": "aluminium","weight": "10"},"description": "Small and powerful, the Jigger is the best ride for the smallest of tikes! ..."
}

1. 拉取數據

import requestsURL = ("https://raw.githubusercontent.com/""bsbodden/redis_vss_getting_started""/main/data/bikes.json")
response = requests.get(URL, timeout=10)
bikes = response.json()

2. 存儲到 Redis（JSON 文檔）

pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):key = f"bikes:{i:03}"pipeline.json().set(key, "$", bike)
pipeline.execute()

你可以這樣讀取某個字段：

client.json().get("bikes:010", "$.model")
# => ['Summit']

五、生成并存儲向量嵌入

1. 選擇文本嵌入模型

from sentence_transformers import SentenceTransformerembedder = SentenceTransformer('msmarco-distilbert-base-v4')

2. 批量獲取描述并生成向量

import numpy as np# 獲取所有 key
keys = sorted(client.keys("bikes:*"))# 批量讀取 description
descs = client.json().mget(keys, "$.description")
# 扁平化列表
descriptions = [item for sublist in descs for item in sublist]# 生成嵌入并轉換為 float32 列表
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIM = len(embeddings[0])  # 768

3. 插入向量字段

pipeline = client.pipeline()
for key, vec in zip(keys, embeddings):pipeline.json().set(key, "$.description_embeddings", vec)
pipeline.execute()

此時，每條記錄都多了一個 $.description_embeddings 數組字段。

六、創建檢索索引

為了同時支持基于字段和基于向量的搜索，需要創建一個 Redis Search 索引：

# 在 Redis CLI 環境中執行
FT.CREATE idx:bikes_vss ON JSONPREFIX 1 bikes:SCHEMA$.model                TEXT    WEIGHT 1.0 NOSTEM$.brand                TEXT    WEIGHT 1.0 NOSTEM$.price                NUMERIC$.type                 TAG     SEPARATOR ","$.description          TEXT    WEIGHT 1.0$.description_embeddings AS vector VECTOR FLAT \TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE

FLAT：扁平索引；也可使用 HNSW（圖索引）以提高速度與擴展性。
TYPE FLOAT32：32 位浮點。
DIM 768：向量維度。
DISTANCE_METRIC COSINE：余弦相似度。

創建完成后，通過 FT.INFO idx:bikes_vss 可以查看索引狀態，確認文檔是否全部就緒。

七、執行向量搜索

1. 嵌入查詢文本

queries = ["Bike for small kids","Best Mountain bikes for kids","Cheap Mountain bike for kids",# ... 共 11 條
]
encoded_queries = embedder.encode(queries)

注意：必須使用與文檔相同的模型和參數，否則語義相似度會大打折扣。

2. 構造 KNN 查詢模板

from redis.commands.search.query import Queryknn_query = (Query("(*)=>[KNN 3 @vector $qvector AS score]").sort_by("score").return_fields("score", "id", "brand", "model", "description").dialect(2)
)

(*)：不過濾，檢索全集。
KNN 3：返回最相近的 3 個向量。
@vector $qvector：向量字段名與占位符。
dialect(2)：必要參數以支持向量查詢語法。

3. 執行查詢并展示

import pandas as pddef run_search(queries, encoded_qs):rows = []for q, vec in zip(queries, encoded_qs):docs = client.ft("idx:bikes_vss") \.search(knn_query, {"qvector": np.array(vec, dtype=np.float32).tobytes()}) \.docsfor doc in docs:rows.append({"query": q,"score": round(1 - float(doc.score), 2),"id":    doc.id,"brand": doc.brand,"model": doc.model,"desc":  doc.description[:100] + "..."})df = pd.DataFrame(rows)return df.sort_values(["query","score"], ascending=[True,False])table = run_search(queries, encoded_queries)
print(table.to_markdown(index=False))

query	score	id	brand	model	desc
Best Mountain bikes for kids	0.54	bikes:003	Nord	Chook air 5	The Chook Air 5 gives kids aged six years and …
…	…	…	…	…	…

八、總結與后續

Redis 強大的模塊化生態（如 RedisJSON、RediSearch）讓其成為輕量級、易上手的向量數據庫方案。想深入了解更多：

向量索引參數：扁平 VS HNSW、距離度量、并行構建等。
多模態數據：結合 RedisAI，直接在 Redis 中進行模型推理。
擴展語言客戶端：C#、JavaScript、Java、Go 等，滿足多種開發場景。

歡迎訪問 Redis University 和 Redis AI 資源庫以獲得更多學習資料。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/diannao/84998.shtml
繁體地址，請注明出處：http://hk.pswp.cn/diannao/84998.shtml
英文地址，請注明出處：http://en.pswp.cn/diannao/84998.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！