當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

安装ik分词器

發布時間：2023/12/10 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了安装ik分词器小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

沒有wget 可以安裝一下：

yum install wget -y

（1）安裝ik分詞器

所有的語言分詞，默認使用的都是“Standard Analyzer”，但是這些分詞器針對于中文的分詞，并不友好。為此需要安裝中文的分詞器。

注意：不能用默認elasticsearch-plugin install xxx.zip 進行自動安裝
https://github.com/medcl/elasticsearch-analysis-ik/releases/download 對應es版本安裝

在前面安裝的elasticsearch時，我們已經將elasticsearch容器的“/usr/share/elasticsearch/plugins”目錄，映射到宿主機的“ /mydata/elasticsearch/plugins”目錄下，所以比較方便的做法就是下載“/elasticsearch-analysis-ik-7.6.2.zip”文件，然后解壓到該文件夾下即可。安裝完畢后，需要重啟elasticsearch容器。

如果不嫌麻煩，還可以采用如下的方式。

（1）查看elasticsearch版本號：

[root@hadoop-104 ~]# curl http://localhost:9200 {"name" : "0adeb7852e00","cluster_name" : "elasticsearch","cluster_uuid" : "9gglpP0HTfyOTRAaSe2rIg","version" : {"number" : "7.6.2", #版本號為7.6.2"build_flavor" : "default","build_type" : "docker","build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f","build_date" : "2020-03-26T06:34:37.794943Z","build_snapshot" : false,"lucene_version" : "8.4.0","minimum_wire_compatibility_version" : "6.8.0","minimum_index_compatibility_version" : "6.0.0-beta1"},"tagline" : "You Know, for Search" } [root@hadoop-104 ~]#

（2）進入es容器內部plugin目錄

docker exec -it 容器id /bin/bash

[root@hadoop-104 ~]# docker exec -it elasticsearch /bin/bash [root@0adeb7852e00 elasticsearch]#

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip

[root@0adeb7852e00 elasticsearch]# pwd /usr/share/elasticsearch #下載ik7.6.2 [root@0adeb7852e00 elasticsearch]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip

unzip 下載的文件

[root@0adeb7852e00 elasticsearch]# unzip elasticsearch-analysis-ik-7.6.2.zip -d ink Archive: elasticsearch-analysis-ik-7.6.2.zipcreating: ik/config/inflating: ik/config/main.dic inflating: ik/config/quantifier.dic inflating: ik/config/extra_single_word_full.dic inflating: ik/config/IKAnalyzer.cfg.xml inflating: ik/config/surname.dic inflating: ik/config/suffix.dic inflating: ik/config/stopword.dic inflating: ik/config/extra_main.dic inflating: ik/config/extra_stopword.dic inflating: ik/config/preposition.dic inflating: ik/config/extra_single_word_low_freq.dic inflating: ik/config/extra_single_word.dic inflating: ik/elasticsearch-analysis-ik-7.6.2.jar inflating: ik/httpclient-4.5.2.jar inflating: ik/httpcore-4.4.4.jar inflating: ik/commons-logging-1.2.jar inflating: ik/commons-codec-1.9.jar inflating: ik/plugin-descriptor.properties inflating: ik/plugin-security.policy [root@0adeb7852e00 elasticsearch]# #移動到plugins目錄下 [root@0adeb7852e00 elasticsearch]# mv ik plugins/

rm -rf *.zip

[root@0adeb7852e00 elasticsearch]# rm -rf elasticsearch-analysis-ik-7.6.2.zip

確認是否安裝好了分詞器

（2）測試分詞器

使用默認

GET my_index/_analyze {"text":"我是中國人" }

請觀察執行結果：

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "<IDEOGRAPHIC>","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "<IDEOGRAPHIC>","position" : 1},{"token" : "中","start_offset" : 2,"end_offset" : 3,"type" : "<IDEOGRAPHIC>","position" : 2},{"token" : "國","start_offset" : 3,"end_offset" : 4,"type" : "<IDEOGRAPHIC>","position" : 3},{"token" : "人","start_offset" : 4,"end_offset" : 5,"type" : "<IDEOGRAPHIC>","position" : 4}] } GET my_index/_analyze {"analyzer": "ik_smart", "text":"我是中國人" }

輸出結果：

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中國人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2}] } GET my_index/_analyze {"analyzer": "ik_max_word", "text":"我是中國人" }

輸出結果：

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中國人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "中國","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 3},{"token" : "國人","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 4}] }

（3）自定義詞庫

修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
/usr/share/elasticsearch/plugins/ik/config

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties><comment>IK Analyzer 擴展配置</comment><entry key="ext_dict"></entry><entry key="ext_stopwords"></entry><entry key="remote_ext_dict">http://192.168.137.14/es/fenci.txt</entry>  </properties>

原來的xml

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties><comment>IK Analyzer 擴展配置</comment><entry key="ext_dict"></entry><entry key="ext_stopwords"></entry> </properties>

修改完成后，需要重啟elasticsearch容器，否則修改不生效。

更新完成后，es只會對于新增的數據用更新分詞。歷史數據是不會重新分詞的。如果想要歷史數據重新分詞，需要執行：

POST my_index/_update_by_query?conflicts=proceed

http://192.168.137.14/es/fenci.txt，這個是nginx上資源的訪問路徑

在運行下面實例之前，需要安裝nginx（安裝方法見安裝nginx），然后創建“fenci.txt”文件，內容如下：

echo "櫻桃薩其馬，帶你甜蜜入夏" > /mydata/nginx/html/fenci.txt

測試效果：

GET my_index/_analyze {"analyzer": "ik_max_word", "text":"櫻桃薩其馬，帶你甜蜜入夏" }

輸出結果：

{"tokens" : [{"token" : "櫻桃","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "薩其馬","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "帶你","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 2},{"token" : "甜蜜","start_offset" : 8,"end_offset" : 10,"type" : "CN_WORD","position" : 3},{"token" : "入夏","start_offset" : 10,"end_offset" : 12,"type" : "CN_WORD","position" : 4}] }

總結

以上是生活随笔為你收集整理的安装ik分词器的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

ik
分词