要為社區APP的帖子提供全文搜索的功能,考察使用ElasticSearch實現此功能。
ES的安裝不再描述。
- es集成中文分詞器(根據es版本選擇對應的插件版本)
下載源碼:https://github.com/medcl/elasticsearch-analysis-ik
maven編譯得到:elasticsearch-analysis-ik-1.9.5.zip
在plugins目錄下創建ik目錄,將elasticsearch-analysis-ik-1.9.5.zip解壓在此目錄。
- 創建索引(settings,mapping)
配置
{"settings":{"number_of_shards":5,"number_of_replicas":1},"mappings":{"post":{"dynamic":"strict","properties":{"id":{"type":"integer","store":"yes"},"title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},"content":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},"author":{"type":"string","store":"yes","index":"no"},"time":{"type":"date","store":"yes","index":"no"}}}} }
執行命令,創建索引
curl -XPOST 'spark2:9200/community' -d @post.json
- 插入數據
工程代碼依賴的jar包
pom.xml
<dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>2.3.3</version>
</dependency>
<dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.7</version>
</dependency>
ES client工具類
public class EsClient {private static TransportClient transportClient;static {Settings settings = Settings.builder().put("cluster.name", "es_cluster").build();try {transportClient = new TransportClient.Builder().settings(settings).build().addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark2"), 9300)).addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark3"), 9300));} catch (UnknownHostException e) {throw new RuntimeException(e);}}public static TransportClient getInstance() {return transportClient;} }
插入數據
TransportClient client = EsClient.getInstance();for (int i = 0; i < 10000; i++) {Post post = new Post(i + "", "hll", "百度百科", "ES即etamsports ,全名上海英模特制衣有限公司,是法國Etam集團在中國的分支企業,創立于1994年底。ES的服裝適合出游、朋友聚會、晚間娛樂、校園生活等各種輕松", new Date());client.prepareIndex("community", "post", post.getId()).setSource(JSON.toJSONString(post)).execute().actionGet();}
- 查詢,高亮
TransportClient client = EsClient.getInstance();SearchResponse response = client.prepareSearch("community").setTypes("post").setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery(QueryBuilders.multiMatchQuery("上海", "title", "content")) .setFrom(0).setSize(10).addHighlightedField("content").setHighlighterPreTags("<red>").setHighlighterPostTags("</red>").execute().actionGet();SearchHits hits = response.getHits();for (SearchHit hit : hits) {String s = "";System.out.println(hit.getHighlightFields());for (Text text : hit.highlightFields().get("content").getFragments()) {s += text.string();}Map<String, Object> source = hit.getSource();source.put("content", s);System.out.println(source);}
查詢結果
{author=hll, id=782, time=1490165237878, title=百度百科, content=ES即etamsports ,全名<red>上海</red>英模特制衣有限公司,是法國Etam集團在中國的分支企業,創立于1994年底。ES的服裝適合出游、朋友聚會、晚間娛樂、校園生活等各種輕松}
?