環境:
Ragflowv0.17.2
問題描述:
RAGFlow報錯:ESConnection.sql got exception
_ming_cheng_tks, '浙江', 'operator=OR;minimum_should_match=30%')
2025-04-25 15:55:06,862 INFO 244867 POST http://localhost:1200/_sql?format=json [status:400 duration:0.002s]
2025-04-25 15:55:06,862 ERROR 244867 ESConnection.sql got exception
Traceback (most recent call last):File "/home/www/ragflow/ragflow/rag/utils/es_conn.py", line 553, in sqlres = self.es.sql.query(body={"query": sql, "fetch_size": fetch_size}, format=format,File "/home/www/ragflow/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/utils.py", line 446, in wrappedreturn api(*args, **kwargs)File "/home/www/ragflow/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/sql.py", line 330, in queryreturn self.perform_request( # type: ignore[return-value]File "/home/www/ragflow/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 389, in perform_requestreturn self._client.perform_request(File "/home/www/ragflow/ragflow/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 320, in perform_requestraise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
解決方案:
1.主要修改點說明:
- 正則表達式增強
pattern = r"([a-z_]+_l?tks)\s+(like|=)\s*'([^']+)'"
- 支持大小寫混合的字段名
- 允許等號(=)和LIKE操作符
- 更嚴格捕獲值部分
- 分詞處理優化
tokenized = rag_tokenizer.fine_grained_tokenize(rag_tokenizer.tokenize(val))
- 確保中文值被正確分詞
- 處理特殊字符轉義
- 參數標準化
'operator=OR, minimum_should_match=30%'
- 使用逗號替代分號作為參數分隔符
- 符合Elasticsearch SQL參數規范
- 保留大小寫
re.sub(re.escape(old), new, sql, flags=re.IGNORECASE)
- 保持原始SQL的大小寫格式
- 避免意外修改其他部分
- 超時時間優化
params={"request_timeout": 30}
- 從2秒延長到30秒
- 適應復雜查詢場景
- 錯誤日志增強
logger.error(f"ES SQL Error: {str(e)} \nQuery: {sql}")
- 記錄完整錯誤信息
- 保留問題查詢語句
典型轉換示例:
原始查詢:
SELECT * WHERE _ming_cheng_tks LIKE '浙江'
轉換后:
SELECT * WHERE MATCH(_ming_cheng_tks, '浙 江', 'operator=OR, minimum_should_match=30%')
驗證方法:
# 測試用例
test_sql = "SELECT _id FROM index WHERE name_tks = '杭州' OR addr_ltks LIKE '西湖區'"
expected = "SELECT _id FROM index WHERE MATCH(name_tks, '杭 州', 'operator=OR, minimum_should_match=30%') OR MATCH(addr_ltks, '西 湖 區', 'operator=OR, minimum_should_match=30%')"
2.修改源碼es_conn.py文件路徑和詳情
大概531行
ragflow-main\rag\utils\es_conn.py
def sql(self, sql: str, fetch_size: int, format: str):logger.debug(f"ESConnection.sql get sql: {sql}")sql = re.sub(r"[ `]+", " ", sql)sql = sql.replace("%", "")replaces = []# 修改點1:增強正則匹配模式pattern = r"([a-z_]+_l?tks)\s+(like|=)\s*'([^']+)'"for r in re.finditer(pattern, sql, re.IGNORECASE):fld, op, val = r.group(1), r.group(2), r.group(3)# 修改點2:正確處理分詞和特殊字符tokenized = rag_tokenizer.fine_grained_tokenize(rag_tokenizer.tokenize(val))# 修改點3:參數格式標準化match_expr = f"MATCH({fld}, '{tokenized}', 'operator=OR, minimum_should_match=30%')" replaces.append((f"{fld} {op} '{val}'", match_expr))# 修改點4:保留原始大小寫格式for old, new in replaces:sql = re.sub(re.escape(old), new, sql, flags=re.IGNORECASE)logger.debug(f"ESConnection.sql transformed: {sql}")# 修改點5:增加超時時間for i in range(ATTEMPT_TIME):try:res = self.es.sql.query(body={"query": sql, "fetch_size": fetch_size},format=format,params={"request_timeout": 30} # 從2秒增加到30秒)return resexcept ConnectionTimeout:logger.exception(f"ESConnection.sql timeout on: {sql}")continueexcept Exception as e:# 修改點6:記錄完整錯誤信息logger.error(f"ES SQL Error: {str(e)} \nQuery: {sql}")return None
3.重啟服務后驗證正常