當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elasticsearch之文档document入门

發布時間：2025/3/20 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch之文档document入门小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

6.1．默認自帶字段解析

{"_index" : "book","_type" : "_doc","_id" : "1","_version" : 1,"_seq_no" : 10,"_primary_term" : 1,"found" : true,"_source" : {"name" : "Bootstrap開發教程1","description" : "Bootstrap是由Twitter推出的一個前臺頁面開發css框架，是一個非常流行的開發框架，此框架集成了多種頁面效果。此開發框架包含了大量的CSS、JS程序代碼，可以幫助開發者（尤其是不擅長css頁面開發的程序人員）輕松的實現一個css，不受瀏覽器限制的精美界面css效果。","studymodel" : "201002","price" : 38.6,"timestamp" : "2019-08-25 19:11:35","pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg","tags" : ["bootstrap","開發"]} }

6.1.1 _index

含義：此文檔屬于哪個索引
原則：類似數據放在一個索引中。數據庫中表的定義規則。如圖書信息放在book索引中，員工信息放在employee索引中。各個索引存儲和搜索時互不影響。
定義規則：英文小寫。盡量不要使用特殊字符。

6.1.2 _type

含義：類別。book java node
注意：以后的es9將徹底刪除此字段，所以當前版本在不斷弱化type。不需要關注。見到_type都為doc。

6.1.3 _id

含義：文檔的唯一標識。就像表的id主鍵。結合索引可以標識和定義一個文檔。

生成：手動（put /index/_doc/id）、自動

6.1.4 創建索引時，不同數據放到不同索引中

6.2．生成文檔id

6.2.1 手動生成id

場景：數據從其他系統導入時，本身有唯一主鍵。如數據庫中的圖書、員工信息等。

用法：put /index/_doc/id

PUT /test_index/_doc/1 {"test_field": "test" }

6.2.2 自動生成id

用法：POST /index/_doc

POST /test_index/_doc {"test_field": "test1" }

{"_index" : "test_index","_type" : "_doc","_id" : "x29LOm0BPsY0gSJFYZAl","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1 }

自動id特點：

長度為20個字符，URL安全，base64編碼，GUID，分布式生成不沖突

6.3． _source 字段

6.3.1 _source

含義：插入數據時的所有字段和值。在get獲取數據時，在_source字段中原樣返回。

GET /book/_doc/1

6.3.2 定制返回字段

就像sql不要select *,而要select name,price from book …一樣。

GET /book/doc/1?source_includes=name,price

{"_index" : "book","_type" : "_doc","_id" : "1","_version" : 1,"_seq_no" : 10,"_primary_term" : 1,"found" : true,"_source" : {"price" : 38.6,"name" : "Bootstrap開發教程1"} }

6.4．文檔的替換與刪除

6.4.1全量替換

執行兩次，返回結果中版本號（_version）在不斷上升。此過程為全量替換。

PUT /test_index/_doc/1 {"test_field": "test" }

實質：舊文檔的內容不會立即刪除，只是標記為deleted。適當的時機，集群會將這些文檔刪除。

6.4.2 強制創建

為防止覆蓋原有數據，我們在新增時，設置為強制創建，不會覆蓋原有文檔。

語法：PUT /index/ doc/id/create

PUT /test_index/_doc/1/_create {"test_field": "test" }

{"error": {"root_cause": [{"type": "version_conflict_engine_exception","reason": "[2]: version conflict, document already exists (current version [1])","index_uuid": "lqzVqxZLQuCnd6LYtZsMkg","shard": "0","index": "test_index"}],"type": "version_conflict_engine_exception","reason": "[2]: version conflict, document already exists (current version [1])","index_uuid": "lqzVqxZLQuCnd6LYtZsMkg","shard": "0","index": "test_index"},"status": 409 }

6.4.3 刪除

DELETE /index/_doc/id

DELETE /test_index/_doc/1/

實質：舊文檔的內容不會立即刪除，只是標記為deleted。適當的時機，集群會將這些文檔刪除。

lazy delete

6.5．局部替換 partial update

使用 PUT /index/type/id 為文檔全量替換，需要將文檔所有數據提交。

partial update局部替換則只修改變動字段。

用法：

post /index/type/id/_update {"doc": {"field"："value"} }

圖解內部原理

內部與全量替換是一樣的，舊文檔標記為刪除，新建一個文檔。

優點：

大大減少網絡傳輸次數和流量，提升性能
減少并發沖突發生的概率。

演示

插入文檔

PUT /test_index/_doc/5 {"test_field1": "hello","test_field2": "yfy" }

修改字段1

POST /test_index/_doc/5/_update {"doc": {"test_field2": " yfy 2"} }

6.6．使用腳本更新

es可以內置腳本執行復雜操作。例如painless腳本。

注意：groovy腳本在es6以后就不支持了。原因是耗內存，不安全遠程注入漏洞。

6.6.1內置腳本

需求1：修改文檔6的num字段，+1。

插入數據

PUT /test_index/_doc/6 {"num": 0,"tags": [] }

執行腳本操作

POST /test_index/_doc/6/_update {"script" : "ctx._source.num+=1" }

查詢數據

GET /test_index/_doc/6

{"_index" : "test_index","_type" : "_doc","_id" : "6","_version" : 2,"_seq_no" : 23,"_primary_term" : 1,"found" : true,"_source" : {"num" : 1,"tags" : [ ]} }

需求2：搜索所有文檔，將num字段乘以2輸出

插入數據

PUT /test_index/_doc/7 {"num": 5 }

查詢

GET /test_index/_search {"script_fields": {"my_doubled_field": {"script": {"lang": "expression","source": "doc['num'] * multiplier","params": {"multiplier": 2}}}} }

{"_index" : "test_index","_type" : "_doc","_id" : "7","_score" : 1.0,"fields" : {"my_doubled_field" : [10.0]}}

6.6.2 外部腳本

Painless是內置支持的。腳本內容可以通過多種途徑傳給 es，包括 rest 接口，或者放到 config/scripts目錄等，默認開啟。

注意：腳本性能低下，且容易發生注入，本教程忽略。

官方文檔：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-using.html

6.7．圖解es的并發問題

如同秒殺，多線程情況下，es同樣會出現并發沖突問題。

6.8．圖解悲觀鎖與樂觀鎖機制

為控制并發問題，我們通常采用鎖機制。分為悲觀鎖和樂觀鎖兩種機制。

悲觀鎖：很悲觀，所有情況都上鎖。此時只有一個線程可以操作數據。具體例子為數據庫中的行級鎖、表級鎖、讀鎖、寫鎖等。

特點：優點是方便，直接加鎖，對程序透明。缺點是效率低。

樂觀鎖：很樂觀，對數據本身不加鎖。提交數據時，通過一種機制驗證是否存在沖突，如es中通過版本號驗證。

特點：優點是并發能力高。缺點是操作繁瑣，在提交數據時，可能反復重試多次。

6.9．圖解es內部基于_version樂觀鎖控制

實驗基于_version的版本控制

es對于文檔的增刪改都是基于版本號。

1新增多次文檔：

PUT /test_index/_doc/3 {"test_field": "test" }

返回版本號遞增

2刪除此文檔

DELETE /test_index/_doc/3

DELETE /test_index/_doc/3 {"_index" : "test_index","_type" : "_doc","_id" : "2","_version" : 6,"result" : "deleted","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 7,"_primary_term" : 1 }

3再新增

PUT /test_index/_doc/3 {"test_field": "test" }

可以看到版本號依然遞增，驗證延遲刪除策略。

如果刪除一條數據立馬刪除的話，所有分片和副本都要立馬刪除，對es集群壓力太大。

es內部主從同步時，是多線程異步。樂觀鎖機制。

6.10．演示客戶端程序基于_version并發操作流程

java python客戶端更新的機制。

新建文檔

PUT /test_index/_doc/5 {"test_field": "itcast" }

{"_index" : "test_index","_type" : "_doc","_id" : "3","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 8,"_primary_term" : 1 }

客戶端1修改。帶版本號1。

首先獲取數據的當前版本號

GET /test_index/_doc/5

更新文檔

PUT /test_index/_doc/5?version=1 {"test_field": "itcast1" } PUT /test_index/_doc/5?if_seq_no=21&if_primary_term=1 {"test_field": "itcast1" }

客戶端2并發修改。帶版本號1。

PUT /test_index/_doc/5?version=1 {"test_field": "itcast2" } PUT /test_index/_doc/5?if_seq_no=21&if_primary_term=1 {"test_field": "itcast1" }

報錯。

客戶端2重新查詢。得到最新版本為2。seq_no=22

GET /test_index/_doc/4

客戶端2并發修改。帶版本號2。

PUT /test_index/_doc/4?version=2 {"test_field": "itcast2" } es7 PUT /test_index/_doc/5?if_seq_no=22&if_primary_term=1 {"test_field": "itcast2" }

修改成功。

6.11．演示自己手動控制版本號 external version

背景：已有數據是在數據庫中，有自己手動維護的版本號的情況下，可以使用external version控制。hbase。

要求：修改時external version要大于當前文檔的_version

對比：基于_version時，修改的文檔version等于當前文檔的版本號。

使用?version=1&version_type=external

新建文檔

PUT /test_index/_doc/4 {"test_field": "itcast" }

更新文檔：

客戶端1修改文檔

PUT /test_index/_doc/4?version=2&version_type=external {"test_field": "itcast1" }

客戶端2同時修改

PUT /test_index/_doc/4?version=2&version_type=external {"test_field": "itcast2" }

{"error": {"root_cause": [{"type": "version_conflict_engine_exception","reason": "[4]: version conflict, current version [2] is higher or equal to the one provided [2]","index_uuid": "-rqYZ2EcSPqL6pu8Gi35jw","shard": "1","index": "test_index"}],"type": "version_conflict_engine_exception","reason": "[4]: version conflict, current version [2] is higher or equal to the one provided [2]","index_uuid": "-rqYZ2EcSPqL6pu8Gi35jw","shard": "1","index": "test_index"},"status": 409 }

客戶端2重新查詢數據

GET /test_index/_doc/4

客戶端2重新修改數據

PUT /test_index/_doc/4?version=3&version_type=external {"test_field": "itcast2" }

6.12．更新時 retry_on_conflict 參數

retry_on_conflict

指定重試次數

POST /test_index/_doc/5/_update?retry_on_conflict=3 {"doc": {"test_field": "itcast1"} }

與 _version結合使用

POST /test_index/_doc/5/_update?retry_on_conflict=3&version=22&version_type=external {"doc": {"test_field": "itcast1"} }

6.13．批量查詢 mget

單條查詢 GET /test_index/_doc/1，如果查詢多個id的文檔一條一條查詢，網絡開銷太大。

mget 批量查詢：

GET /_mget {"docs" : [{"_index" : "test_index","_type" : "_doc","_id" : ? 1},{"_index" : "test_index","_type" : "_doc","_id" : ? 7}] }

{"docs" : [{"_index" : "test_index","_type" : "_doc","_id" : "2","_version" : 6,"_seq_no" : 12,"_primary_term" : 1,"found" : true,"_source" : {"test_field" : "test12333123321321"}},{"_index" : "test_index","_type" : "_doc","_id" : "3","_version" : 6,"_seq_no" : 18,"_primary_term" : 1,"found" : true,"_source" : {"test_field" : "test3213"}}] }

提示去掉type

GET /_mget {"docs" : [{"_index" : "test_index","_id" : ? 2},{"_index" : "test_index","_id" : ? 3}] }

同一索引下批量查詢：

GET /test_index/_mget {"docs" : [{"_id" : ? 2},{"_id" : ? 3}] }

第三種寫法：搜索寫法

post /test_index/_doc/_search {"query": {"ids" : {"values" : ["1", "7"]}} }

6.14．批量增刪改 bulk

Bulk 操作解釋將文檔的增刪改查一系列操作，通過一次請求全都做完。減少網絡傳輸次數。

語法：

POST /_bulk {"action": {"metadata"}} {"data"}

如下操作，刪除5，新增14，修改2。

POST /_bulk { "delete": { "_index": "test_index", "_id": "5" }} { "create": { "_index": "test_index", "_id": "14" }} { "test_field": "test14" } { "update": { "_index": "test_index", "_id": "2"} } { "doc" : {"test_field" : "bulk test"} }

總結：

1功能：

delete：刪除一個文檔，只要1個json串就可以了
create：相當于強制創建 PUT /index/type/id/_create
index：普通的put操作，可以是創建文檔，也可以是全量替換文檔
update：執行的是局部更新partial update操作

2格式：每個json不能換行。相鄰json必須換行。

3隔離：每個操作互不影響。操作失敗的行會返回其失敗信息。

4實際用法：bulk請求一次不要太大，否則一下積壓到內存中，性能會下降。所以，一次請求幾千個操作、大小在幾M正好。

6.15．文檔概念學習總結

章節回顧

1文檔的增刪改查

2文檔字段解析

3內部鎖機制

4批量查詢修改

es是什么

一個分布式的文檔數據存儲系統distributed document store。es看做一個分布式nosql數據庫。如redis\mongoDB\hbase。

文檔數據：es可以存儲和操作json文檔類型的數據，而且這也是es的核心數據結構。存儲系統：es可以對json文檔類型的數據進行存儲，查詢，創建，更新，刪除，等等操作。

應用場景

大數據。es的分布式特點，水平擴容承載大數據。
數據結構靈活。列隨時變化。使用關系型數據庫將會建立大量的關聯表，增加系統復雜度。
數據操作簡單。就是查詢，不涉及事務。

舉例

電商頁面、傳統論壇頁面等。面向的對象比較復雜，但是作為終端，沒有太復雜的功能（事務），只涉及簡單的增刪改查crud。

這個時候選用ES這種NoSQL型的數據存儲，比傳統的復雜的事務強大的關系型數據庫，更加合適一些。無論是性能，還是吞吐量，可能都會更好。

總結

以上是生活随笔為你收集整理的Elasticsearch之文档document入门的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Elasticsearch相关软件安装
下一篇： Elasticsearch之mappin