日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

安装ik分词器

發布時間:2023/12/10 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 安装ik分词器 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

沒有wget 可以安裝一下:

yum install wget -y
(1)安裝ik分詞器

所有的語言分詞,默認使用的都是“Standard Analyzer”,但是這些分詞器針對于中文的分詞,并不友好。為此需要安裝中文的分詞器。

注意:不能用默認elasticsearch-plugin install xxx.zip 進行自動安裝
https://github.com/medcl/elasticsearch-analysis-ik/releases/download 對應es版本安裝

在前面安裝的elasticsearch時,我們已經將elasticsearch容器的“/usr/share/elasticsearch/plugins”目錄,映射到宿主機的“ /mydata/elasticsearch/plugins”目錄下,所以比較方便的做法就是下載“/elasticsearch-analysis-ik-7.6.2.zip”文件,然后解壓到該文件夾下即可。安裝完畢后,需要重啟elasticsearch容器。

如果不嫌麻煩,還可以采用如下的方式。

(1)查看elasticsearch版本號:
[root@hadoop-104 ~]# curl http://localhost:9200 {"name" : "0adeb7852e00","cluster_name" : "elasticsearch","cluster_uuid" : "9gglpP0HTfyOTRAaSe2rIg","version" : {"number" : "7.6.2", #版本號為7.6.2"build_flavor" : "default","build_type" : "docker","build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f","build_date" : "2020-03-26T06:34:37.794943Z","build_snapshot" : false,"lucene_version" : "8.4.0","minimum_wire_compatibility_version" : "6.8.0","minimum_index_compatibility_version" : "6.0.0-beta1"},"tagline" : "You Know, for Search" } [root@hadoop-104 ~]#
(2)進入es容器內部plugin目錄
  • docker exec -it 容器id /bin/bash
[root@hadoop-104 ~]# docker exec -it elasticsearch /bin/bash [root@0adeb7852e00 elasticsearch]#
  • wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
[root@0adeb7852e00 elasticsearch]# pwd /usr/share/elasticsearch #下載ik7.6.2 [root@0adeb7852e00 elasticsearch]# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip
  • unzip 下載的文件
[root@0adeb7852e00 elasticsearch]# unzip elasticsearch-analysis-ik-7.6.2.zip -d ink Archive: elasticsearch-analysis-ik-7.6.2.zipcreating: ik/config/inflating: ik/config/main.dic inflating: ik/config/quantifier.dic inflating: ik/config/extra_single_word_full.dic inflating: ik/config/IKAnalyzer.cfg.xml inflating: ik/config/surname.dic inflating: ik/config/suffix.dic inflating: ik/config/stopword.dic inflating: ik/config/extra_main.dic inflating: ik/config/extra_stopword.dic inflating: ik/config/preposition.dic inflating: ik/config/extra_single_word_low_freq.dic inflating: ik/config/extra_single_word.dic inflating: ik/elasticsearch-analysis-ik-7.6.2.jar inflating: ik/httpclient-4.5.2.jar inflating: ik/httpcore-4.4.4.jar inflating: ik/commons-logging-1.2.jar inflating: ik/commons-codec-1.9.jar inflating: ik/plugin-descriptor.properties inflating: ik/plugin-security.policy [root@0adeb7852e00 elasticsearch]# #移動到plugins目錄下 [root@0adeb7852e00 elasticsearch]# mv ik plugins/
  • rm -rf *.zip
[root@0adeb7852e00 elasticsearch]# rm -rf elasticsearch-analysis-ik-7.6.2.zip

確認是否安裝好了分詞器

(2)測試分詞器

使用默認

GET my_index/_analyze {"text":"我是中國人" }

請觀察執行結果:

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "<IDEOGRAPHIC>","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "<IDEOGRAPHIC>","position" : 1},{"token" : "中","start_offset" : 2,"end_offset" : 3,"type" : "<IDEOGRAPHIC>","position" : 2},{"token" : "國","start_offset" : 3,"end_offset" : 4,"type" : "<IDEOGRAPHIC>","position" : 3},{"token" : "人","start_offset" : 4,"end_offset" : 5,"type" : "<IDEOGRAPHIC>","position" : 4}] } GET my_index/_analyze {"analyzer": "ik_smart", "text":"我是中國人" }

輸出結果:

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中國人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2}] } GET my_index/_analyze {"analyzer": "ik_max_word", "text":"我是中國人" }

輸出結果:

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中國人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "中國","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 3},{"token" : "國人","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 4}] }
(3)自定義詞庫
  • 修改/usr/share/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
    /usr/share/elasticsearch/plugins/ik/config
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties><comment>IK Analyzer 擴展配置</comment><!--用戶可以在這里配置自己的擴展字典 --><entry key="ext_dict"></entry><!--用戶可以在這里配置自己的擴展停止詞字典--><entry key="ext_stopwords"></entry><!--用戶可以在這里配置遠程擴展字典 --><entry key="remote_ext_dict">http://192.168.137.14/es/fenci.txt</entry> <!--用戶可以在這里配置遠程擴展停止詞字典--><!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>

原來的xml

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties><comment>IK Analyzer 擴展配置</comment><!--用戶可以在這里配置自己的擴展字典 --><entry key="ext_dict"></entry><!--用戶可以在這里配置自己的擴展停止詞字典--><entry key="ext_stopwords"></entry><!--用戶可以在這里配置遠程擴展字典 --><!-- <entry key="remote_ext_dict">words_location</entry> --><!--用戶可以在這里配置遠程擴展停止詞字典--><!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>

修改完成后,需要重啟elasticsearch容器,否則修改不生效。

更新完成后,es只會對于新增的數據用更新分詞。歷史數據是不會重新分詞的。如果想要歷史數據重新分詞,需要執行:

POST my_index/_update_by_query?conflicts=proceed

http://192.168.137.14/es/fenci.txt,這個是nginx上資源的訪問路徑

在運行下面實例之前,需要安裝nginx(安裝方法見安裝nginx),然后創建“fenci.txt”文件,內容如下:

echo "櫻桃薩其馬,帶你甜蜜入夏" > /mydata/nginx/html/fenci.txt

測試效果:

GET my_index/_analyze {"analyzer": "ik_max_word", "text":"櫻桃薩其馬,帶你甜蜜入夏" }

輸出結果:

{"tokens" : [{"token" : "櫻桃","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "薩其馬","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "帶你","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 2},{"token" : "甜蜜","start_offset" : 8,"end_offset" : 10,"type" : "CN_WORD","position" : 3},{"token" : "入夏","start_offset" : 10,"end_offset" : 12,"type" : "CN_WORD","position" : 4}] }

總結

以上是生活随笔為你收集整理的安装ik分词器的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。