RedisVL 入門構建高效的 AI 向量搜索應用

一、前置條件

在開始之前，請確保：

已在 Python 環境中安裝 redisvl。
運行 Redis Stack 或 Redis Cloud 實例。

二、定義索引架構（IndexSchema）

索引架構（IndexSchema）用于定義 Redis 的索引配置和字段信息，支持通過 Python 字典或 YAML 文件創建。以下以用戶數據集為例，包含 user、job、age、credit_score 和三維的 user_embedding 向量。

2.1.示例架構

假設我們需要為數據集定義一個索引，索引名稱為 user_simple，鍵前綴為 user_simple_docs。

2.3.YAML 格式

version: '0.1.0'index:name: user_simpleprefix: user_simple_docsfields:- name: usertype: tag- name: credit_scoretype: tag- name: jobtype: text- name: agetype: numeric- name: user_embeddingtype: vectorattrs:algorithm: flatdims: 3distance_metric: cosinedatatype: float32

將上述內容保存為 schema.yaml 文件。

3.3.Python 字典格式

schema = {"index": {"name": "user_simple","prefix": "user_simple_docs",},"fields": [{"name": "user", "type": "tag"},{"name": "credit_score", "type": "tag"},{"name": "job", "type": "text"},{"name": "age", "type": "numeric"},{"name": "user_embedding","type": "vector","attrs": {"dims": 3,"distance_metric": "cosine","algorithm": "flat","datatype": "float32"}}]
}

三、準備樣本數據集

我們創建一個包含 user、job、age、credit_score 和 user_embedding 字段的樣本數據集。user_embedding 為三維向量，僅用于演示。

import numpy as npdata = [{'user': 'john','age': 1,'job': 'engineer','credit_score': 'high','user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()},{'user': 'mary','age': 2,'job': 'doctor','credit_score': 'low','user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()},{'user': 'joe','age': 3,'job': 'dentist','credit_score': 'medium','user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()}
]

注意，user_embedding 向量通過 NumPy 轉換為字節格式，以符合 Redis 的存儲要求。

四、創建搜索索引（SearchIndex）

準備好架構和數據集后，我們可以創建 SearchIndex 對象。

4.1.使用自定義 Redis 連接

如果需要自定義 Redis 連接設置或共享連接池：

from redisvl.index import SearchIndex
from redis import Redisclient = Redis.from_url("redis://localhost:6379")
index = SearchIndex.from_dict(schema, redis_client=client, validate_on_load=True)

4.2.讓索引管理連接

對于簡單場景，可以讓索引自動管理 Redis 連接：

index = SearchIndex.from_dict(schema, redis_url="redis://localhost:6379", validate_on_load=True)

4.3.創建索引

執行以下命令創建索引：

index.create(overwrite=True)

此時，索引已創建但尚無數據。

五、使用 rvl CLI 檢查索引

通過 rvl 命令行工具檢查索引信息：

rvl index listall

輸出：

19:17:09 [RedisVL] INFO   Indices:
19:17:09 [RedisVL] INFO   1. user_simple

查看具體索引詳情：

rvl index info -i user_simple

輸出：

Index Information:
╭──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────╮
│ Index Name           │ Storage Type         │ Prefixes             │ Index Options        │ Indexing             │
├──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┤
| user_simple          | HASH                 | ['user_simple_docs'] | []                   | 0                    |
╰──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────╯
Index Fields:
╭─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────╮
│ Name            │ Attribute       │ Type            │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │
├─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ user            │ user            │ TAG             │ SEPARATOR       │ ,               │                 │                 │                 │                 │                 │                 │
│ credit_score    │ credit_score    │ TAG             │ SEPARATOR       │ ,               │                 │                 │                 │                 │                 │                 │
│ job             │ job             │ TEXT            │ WEIGHT          │ 1               │                 │                 │                 │                 │                 │                 │
│ age             │ age             │ NUMERIC         │                 │                 │                 │                 │                 │                 │                 │                 │
│ user_embedding  │ user_embedding  │ VECTOR          │ algorithm       │ FLAT            │ data_type       │ FLOAT32         │ dim             │ 3               │ distance_metric │ COSINE          │
╰─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────╯

六、加載數據到索引

使用 load 方法將樣本數據加載到 Redis 中：

keys = index.load(data)
print(keys)

輸出：

['user_simple_docs:01JT4PPPNJZMSK2395RKD208T9', 'user_simple_docs:01JT4PPPNM63J55ZESZ4TV1VR8', 'user_simple_docs:01JT4PPPNM59RCKS2YQ58B1HQW']

RedisVL 使用 Pydantic 進行數據驗證，確保加載的數據符合架構要求。如果數據無效（例如 user_embedding 不是字節類型），會拋出 SchemaValidationError。

七、更新索引數據

通過再次調用 load 方法可以插入或更新（upsert）數據：

new_data = [{'user': 'tyler','age': 9,'job': 'engineer','credit_score': 'high','user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()
}]
keys = index.load(new_data)
print(keys)

輸出：

['user_simple_docs:01JT4PPX63CH5YRN2BGEYB5TS2']

八、創建并執行向量查詢

使用 VectorQuery 創建向量查詢對象：

from redisvl.query import VectorQueryquery = VectorQuery(vector=[0.1, 0.1, 0.5],vector_field_name="user_embedding",return_fields=["user", "age", "job", "credit_score", "vector_distance"],num_results=3
)

執行查詢：

results = index.query(query)

輸出：

vector_distance  user  age  job       credit_score
0               john  1    engineer  high
0               mary  2    doctor    low
0.0566299557686 tyler 9    engineer  high

九、使用異步 Redis 客戶端

在生產環境中，推薦使用異步客戶端 AsyncSearchIndex：

from redisvl.index import AsyncSearchIndex
from redis.asyncio import Redisclient = Redis.from_url("redis://localhost:6379")
index = AsyncSearchIndex.from_dict(schema, redis_client=client)
results = await index.query(query)

輸出與同步查詢一致。

十、更新索引架構

如果需要更新索引架構（例如將 job 字段從 text 改為 tag，或將 user_embedding 從 flat 向量索引改為 hnsw），可以直接修改并重新創建索引：

index.schema.remove_field("job")
index.schema.remove_field("user_embedding")
index.schema.add_fields([{"name": "job", "type": "tag"},{"name": "user_embedding","type": "vector","attrs": {"dims": 3,"distance_metric": "cosine","algorithm": "hnsw","datatype": "float32"}}
])await index.create(overwrite=True, drop=False)

這將保留現有數據，僅更新索引配置。

十一、檢查索引統計信息

使用 rvl CLI 查看索引統計：

rvl stats -i user_simple

輸出：

Statistics:
╭─────────────────────────────┬────────────╮
│ Stat Key                    │ Value      │
├─────────────────────────────┼────────────┤
│ num_docs                    │ 4          │
│ num_terms                   │ 0          │
│ max_doc_id                  │ 4          │
│ num_records                 │ 20         │
│ percent_indexed             │ 1          │
│ hash_indexing_failures      │ 0          │
│ number_of_uses              │ 2          │
│ bytes_per_record_avg        │ 48.2000007 │
│ doc_table_size_mb           │ 4.23431396 │
│ inverted_sz_mb              │ 9.19342041 │
│ key_table_size_mb           │ 1.93595886 │
│ offset_bits_per_record_avg  │ nan        │
│ offset_vectors_sz_mb        │ 0          │
│ offsets_per_term_avg        │ 0          │
│ records_per_doc_avg         │ 5          │
│ sortable_values_size_mb     │ 0          │
│ total_indexing_time         │ 0.74400001 │
│ total_inverted_index_blocks │ 11         │
│ vector_index_sz_mb          │ 0.23560333 │
╰─────────────────────────────┴────────────╯

十二、清理

清理數據或索引：

# 清除索引中的所有數據，但保留索引結構
await index.clear()# 完全刪除索引及其數據
await index.delete()

十三、總結

RedisVL 提供了一個簡單而強大的接口，用于在 Redis 中進行向量搜索。通過定義索引架構、加載數據、執行向量查詢以及更新索引，你可以快速構建高效的 AI 應用。結合異步客戶端和 CLI 工具，RedisVL 適用于從開發到生產環境的多種場景。