一、前置條件
在開始之前,請確保:
- 已安裝
redisvl
并激活相應的 Python 環境。 - 運行 Redis 實例,且 RediSearch 版本 > 2.4。
二、初始化與數據加載
我們將使用一個包含用戶信息的數據集,字段包括 user
、age
、job
、credit_score
、office_location
、user_embedding
和 last_updated
。以下是初始化索引和加載數據的代碼:
import pickle
from redisvl.index import SearchIndex# 加載示例數據
data = pickle.load(open("hybrid_example_data.pkl", "rb"))# 定義索引架構
schema = {"index": {"name": "user_queries","prefix": "user_queries_docs","storage_type": "hash",},"fields": [{"name": "user", "type": "tag"},{"name": "credit_score", "type": "tag"},{"name": "job", "type": "text"},{"name": "age", "type": "numeric"},{"name": "last_updated", "type": "numeric"},{"name": "office_location", "type": "geo"},{"name": "user_embedding","type": "vector","attrs": {"dims": 3,"distance_metric": "cosine","algorithm": "flat","datatype": "float32"}}],
}# 創建搜索索引
index = SearchIndex.from_dict(schema, redis_url="redis://localhost:6379")
index.create(overwrite=True)# 加載數據
keys = index.load(data)
print(index.info()['num_docs']) # 輸出:7
使用 rvl
CLI 檢查索引:
rvl index listall
三、混合查詢
混合查詢結合多種過濾器,例如根據年齡、職業和地理位置進行篩選。以下展示不同類型的過濾器及其應用。
3.1.標簽過濾器(Tag Filters)
標簽過濾器用于對分類字段(如 credit_score
)進行精確匹配。
from redisvl.query import VectorQuery
from redisvl.query.filter import Tag# 篩選信用評分為 "high" 的用戶
t = Tag("credit_score") == "high"
v = VectorQuery(vector=[0.1, 0.1, 0.5],vector_field_name="user_embedding",return_fields=["user", "credit_score", "age", "job", "office_location", "last_updated"],filter_expression=t
)
results = index.query(v)
輸出:
vector_distance user credit_score age job office_location last_updated
0 john high 18 engineer -122.4194,37.7749 1741627789
0.109129190445 tyler high 100 engineer -122.0839,37.3861 1742232589
0.158808946609 tim high 12 dermatologist -122.0839,37.3861 1739644189
0.266666650772 nancy high 94 doctor -122.4194,37.7749 1710696589
支持否定和多值匹配:
# 否定:非 "high" 信用評分
t = Tag("credit_score") != "high"
v.set_filter(t)
results = index.query(v)
# 多值匹配:信用評分為 "high" 或 "medium"
t = Tag("credit_score") == ["high", "medium"]
v.set_filter(t)
results = index.query(v)
空標簽列表會優雅地回退為通配符查詢:
t = Tag("credit_score") == []
v.set_filter(t)
results = index.query(v)
3.2.數值過濾器(Numeric Filters)
數值過濾器用于篩選數值字段的范圍或精確值。
from redisvl.query.filter import Num# 篩選年齡在 15-35 歲的用戶
numeric_filter = Num("age").between(15, 35)
v.set_filter(numeric_filter)
results = index.query(v)
輸出:
vector_distance user credit_score age job office_location last_updated
0 john high 18 engineer -122.4194,37.7749 1741627789
0.217882037163 taimur low 15 CEO -122.0839,37.3861 1742232589
0.653301358223 joe medium 35 dentist -122.0839,37.3861 1742232589
支持精確匹配和否定:
# 精確匹配:年齡為 14
numeric_filter = Num("age") == 14
v.set_filter(numeric_filter)
# 否定:年齡不為 14
numeric_filter = Num("age") != 14
v.set_filter(numeric_filter)
3.3.時間戳過濾器(Timestamp Filters)
時間戳過濾器支持使用 Python 的 datetime
對象進行時間篩選。
from redisvl.query.filter import Timestamp
from datetime import datetimedt = datetime(2025, 3, 16, 13, 45, 39, 132589)
timestamp_filter = Timestamp("last_updated") > dt
v.set_filter(timestamp_filter)
results = index.query(v)
輸出:
vector_distance user credit_score age job office_location last_updated
0.109129190445 tyler high 100 engineer -122.0839,37.3861 1742232589
0.217882037163 taimur low 15 CEO -122.0839,37.3861 1742232589
0.653301358223 joe medium 35 dentist -122.0839,37.3861 1742232589
支持范圍查詢:
dt_1 = datetime(2025, 1, 14, 13, 45, 39, 132589)
dt_2 = datetime(2025, 3, 16, 13, 45, 39, 132589)
timestamp_filter = Timestamp("last_updated").between(dt_1, dt_2)
v.set_filter(timestamp_filter)
3.4.文本過濾器(Text Filters)
文本過濾器用于對文本字段進行精確、模糊或通配符匹配。
from redisvl.query.filter import Text# 精確匹配:職業為 "doctor"
text_filter = Text("job") == "doctor"
v.set_filter(text_filter)
輸出:
vector_distance user credit_score age job office_location last_updated
0 derrick low 14 doctor -122.4194,37.7749 1741627789
0.266666650772 nancy high 94 doctor -122.4194,37.7749 1710696589
支持通配符和模糊匹配:
# 通配符:職業以 "doct" 開頭
wildcard_filter = Text("job") % "doct*"
v.set_filter(wildcard_filter)
# 模糊匹配:職業包含 "engine"
fuzzy_match = Text("job") % "%%engine%%"
v.set_filter(fuzzy_match)
支持條件匹配:
# 條件匹配:職業為 "engineer" 或 "doctor"
conditional = Text("job") % "engineer|doctor"
v.set_filter(conditional)
3.5.地理過濾器(Geo Filters)
地理過濾器用于篩選指定位置和半徑范圍內的記錄。
from redisvl.query.filter import Geo, GeoRadius# 篩選距離舊金山辦公室 10 公里內的用戶
geo_filter = Geo("office_location") == GeoRadius(-122.4194, 37.7749, 10, "km")
v.set_filter(geo_filter)
輸出:
score vector_distance user credit_score age job office_location
0.454545444693 0 john high 18 engineer -122.4194,37.7749
0.454545444693 0 derrick low 14 doctor -122.4194,37.7749
0.454545444693 0.266666650772 nancy high 94 doctor -122.4194,37.7749
支持否定查詢:
# 非 10 公里范圍內的用戶
geo_filter = Geo("office_location") != GeoRadius(-122.4194, 37.7749, 10, "km")
v.set_filter(geo_filter)
3.6.組合過濾器
通過 &
(交集)和 |
(并集)操作符組合多種過濾器。
t = Tag("credit_score") == "high"
low = Num("age") >= 18
high = Num("age") <= 100
ts = Timestamp("last_updated") > datetime(2025, 3, 16, 13, 45, 39, 132589)
combined = t & low & high & ts
v = VectorQuery([0.1, 0.1, 0.5],"user_embedding",return_fields=["user", "credit_score", "age", "job", "office_location"],filter_expression=combined
)
results = index.query(v)
輸出:
vector_distance user credit_score age job office_location
0.109129190445 tyler high 100 engineer -122.0839,37.3861
并集查詢:
low = Num("age") < 18
high = Num("age") > 93
combined = low | high
v.set_filter(combined)
動態組合過濾器:
def make_filter(age=None, credit=None, job=None):flexible_filter = ((Num("age") > age) &(Tag("credit_score") == credit) &(Text("job") % job))return flexible_filter# 示例:篩選年齡 > 18,信用評分為 high,職業為 engineer
combined = make_filter(age=18, credit="high", job="engineer")
v.set_filter(combined)
results = index.query(v)
四、非向量查詢
使用 FilterQuery
執行類似 SQL 的非向量查詢:
from redisvl.query import FilterQueryhas_low_credit = Tag("credit_score") == "low"
filter_query = FilterQuery(return_fields=["user", "credit_score", "age", "job", "location"],filter_expression=has_low_credit
)
results = index.query(filter_query)
輸出:
user credit_score age job
derrick low 14 doctor
taimur low 15 CEO
五、計數查詢
使用 CountQuery
統計符合條件的記錄數:
from redisvl.query import CountQueryhas_low_credit = Tag("credit_score") == "low"
filter_query = CountQuery(filter_expression=has_low_credit)
count = index.query(filter_query)
print(f"{count} records match the filter expression {str(has_low_credit)}")
輸出:
4 records match the filter expression @credit_score:{low}
六、范圍查詢
RangeQuery
用于篩選向量距離在指定閾值內的記錄:
from redisvl.query import RangeQueryrange_query = RangeQuery(vector=[0.1, 0.1, 0.5],vector_field_name="user_embedding",return_fields=["user", "credit_score", "age", "job", "location"],distance_threshold=0.2
)
results = index.query(range_query)
輸出:
vector_distance user credit_score age job
0 john high 18 engineer
0 derrick low 14 doctor
0.109129190445 tyler high 100 engineer
0.158808946609 tim high 12 dermatologist
調整距離閾值:
range_query.set_distance_threshold(0.1)
results = index.query(range_query)
結合過濾器:
is_engineer = Text("job") == "engineer"
range_query.set_filter(is_engineer)
results = index.query(range_query)
七、高級查詢修飾符
支持排序、方言選擇等高級功能:
v = VectorQuery(vector=[0.1, 0.1, 0.5],vector_field_name="user_embedding",return_fields=["user", "credit_score", "age", "job", "office_location"],num_results=5,filter_expression=is_engineer
).sort_by("age", asc=False).dialect(3)
results = index.query(v)
輸出:
vector_distance age user credit_score job office_location
0.109129190445 100 tyler high engineer -122.0839,37.3861
0 18 john high engineer -122.4194,37.7749
八、原始 Redis 查詢字符串
將查詢轉換為原始 Redis 查詢字符串:
str(v)
輸出:
@job:("engineer")=>[KNN 5 @user_embedding $vector AS vector_distance] RETURN 6 user credit_score age job office_location vector_distance SORTBY age DESC DIALECT 3 LIMIT 0 5
直接使用原始查詢字符串:
results = index.search("@credit_score:{high}")
for r in results.docs:print(r.__dict__)
九、清理
刪除索引:
index.delete()
十、總結
RedisVL 提供了靈活的查詢接口,支持標簽、數值、時間戳、文本、地理等多種過濾器,以及向量、非向量、計數和范圍查詢。通過組合過濾器和動態參數化,開發者可以構建高效的搜索應用,適用于從簡單到復雜的場景。更多查詢修飾符和 API 詳情,請參閱 RedisVL 官方文檔。