安装ik分词器
沒有wget 可以安裝一下:
yum install wget -y(1)安裝ik分詞器
所有的語言分詞,默認使用的都是“Standard Analyzer”,但是這些分詞器針對于中文的分詞,并不友好。為此需要安裝中文的分詞器。
注意:不能用默認elasticsearch-plugin install xxx.zip 進行自動安裝
https://github.com/medcl/elasticsearch-analysis-ik/releases/download 對應es版本安裝
在前面安裝的elasticsearch時,我們已經將elasticsearch容器的“/usr/share/elasticsearch/plugins”目錄,映射到宿主機的“ /mydata/elasticsearch/plugins”目錄下,所以比較方便的做法就是下載“/elasticsearch-analysis-ik-7.6.2.zip”文件,然后解壓到該文件夾下即可。安裝完畢后,需要重啟elasticsearch容器。
如果不嫌麻煩,還可以采用如下的方式。
(1)查看elasticsearch版本號:
[root@hadoop-104 ~]# curl http://localhost:9200 {"name" : "0adeb7852e00","cluster_name" : "elasticsearch","cluster_uuid" : "9gglpP0HTfyOTRAaSe2rIg","version" : {"number" : "7.6.2", #版本號為7.6.2"build_flavor" : "default","build_type" : "docker","build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f","build_date" : "2020-03-26T06:34:37.794943Z","build_snapshot" : false,"lucene_version" : "8.4.0","minimum_wire_compatibility_version" : "6.8.0","minimum_index_compatibility_version" : "6.0.0-beta1"},"tagline" : "You Know, for Search" } [root@hadoop-104 ~]#(2)進入es容器內部plugin目錄
- docker exec -it 容器id /bin/bash
- wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
- unzip 下載的文件
- rm -rf *.zip
確認是否安裝好了分詞器
(2)測試分詞器
使用默認
GET my_index/_analyze {"text":"我是中國人" }請觀察執行結果:
{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "<IDEOGRAPHIC>","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "<IDEOGRAPHIC>","position" : 1},{"token" : "中","start_offset" : 2,"end_offset" : 3,"type" : "<IDEOGRAPHIC>","position" : 2},{"token" : "國","start_offset" : 3,"end_offset" : 4,"type" : "<IDEOGRAPHIC>","position" : 3},{"token" : "人","start_offset" : 4,"end_offset" : 5,"type" : "<IDEOGRAPHIC>","position" : 4}] } GET my_index/_analyze {"analyzer": "ik_smart", "text":"我是中國人" }輸出結果:
{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中國人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2}] } GET my_index/_analyze {"analyzer": "ik_max_word", "text":"我是中國人" }輸出結果:
{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中國人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "中國","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 3},{"token" : "國人","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 4}] }(3)自定義詞庫
- 修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
/usr/share/elasticsearch/plugins/ik/config
原來的xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties><comment>IK Analyzer 擴展配置</comment><!--用戶可以在這里配置自己的擴展字典 --><entry key="ext_dict"></entry><!--用戶可以在這里配置自己的擴展停止詞字典--><entry key="ext_stopwords"></entry><!--用戶可以在這里配置遠程擴展字典 --><!-- <entry key="remote_ext_dict">words_location</entry> --><!--用戶可以在這里配置遠程擴展停止詞字典--><!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>修改完成后,需要重啟elasticsearch容器,否則修改不生效。
更新完成后,es只會對于新增的數據用更新分詞。歷史數據是不會重新分詞的。如果想要歷史數據重新分詞,需要執行:
POST my_index/_update_by_query?conflicts=proceedhttp://192.168.137.14/es/fenci.txt,這個是nginx上資源的訪問路徑
在運行下面實例之前,需要安裝nginx(安裝方法見安裝nginx),然后創建“fenci.txt”文件,內容如下:
echo "櫻桃薩其馬,帶你甜蜜入夏" > /mydata/nginx/html/fenci.txt測試效果:
GET my_index/_analyze {"analyzer": "ik_max_word", "text":"櫻桃薩其馬,帶你甜蜜入夏" }輸出結果:
{"tokens" : [{"token" : "櫻桃","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "薩其馬","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "帶你","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 2},{"token" : "甜蜜","start_offset" : 8,"end_offset" : 10,"type" : "CN_WORD","position" : 3},{"token" : "入夏","start_offset" : 10,"end_offset" : 12,"type" : "CN_WORD","position" : 4}] }總結
- 上一篇: py自定义函数
- 下一篇: vba 数组填充单元格