為了執行混合搜索,我們結合了 BM25 和密集檢索的結果。每種方法的分數均經過標準化和加權以獲得最佳總體結果
1 代碼
先編寫 BM25搜索的代碼,再編寫密集檢索的代碼,最后進行混合。
from rank_bm25 import BM25Okapi
from nltk.tokenize import word_tokenize
import jieba
import json# Sampledocuments
# documents = ["The cat sat on the mat.", "The dog barked at the moon.", "The sun is shining bright."]with open('train_zh.json', 'r', encoding='utf-8') as f:data = [json.loads(line) for line in f]# print(data[0:100])
# Extract instructions and outputs
instructions = [entry[