當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Document API

發布時間：2025/3/17 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 Document API 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Index API

index api用來新增文檔，支持如下幾種方式：

# 指定id創建，如果id已存在，則會進行更新，`_version` + 1 PUT {index}/_doc/{id}# 強制創建，如果id已經存在，409錯誤(以下二者等價) PUT {index}/_doc/{id}?op_type=create PUT {index}/_create/{id}# POST創建，自動生成ID POST {index}/_doc/

指定ID請求方式為PUT，自動生成ID的請求方式為POST

具體示例：

// 示例1 往"twitter"中插入文檔，指定id是1 PUT twitter/_doc/1 {"user" : "Jack","post_date" : "2019-05-15T14:12:12","message" : "trying out Elasticsearch" }

結果：

{"_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1 }

_shards字段提供了索引操作的副本處理信息：

total 應執行的分片數
successful 成功執行的分片數
failed 失敗數

只要successful的值至少為1，那么索引操作就是成功了。

id已經存在的情況下會執行更新操作：

PUT twitter/_doc/1 {"user" : "Jack2","post_date" : "2019-05-15T14:12:12","message" : "trying out Elasticsearch" } "_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 2,"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 2,"_primary_term" : 1 }

自動創建索引

通過索引插入文檔時，如果索引（集）不存在，比如上述的twitter，將會自動創建索引。當然我們可以修改這個設定：

PUT _cluster/settings {"persistent": {"action.auto_create_index": "false" // false 禁止 true 允許} }

action.auto_create_index的值支持更為復雜的設定，比如：

"action.auto_create_index": "twitter, facebook,-tieba,+topic*"

+是允許，-是禁止，*是通配符。

以上配置的含義是，允許為twitter, facebook, 以及任何匹配topic*的自動創建索引，禁止為tieba創建索引。

操作類別

索引操作可以接收op_type參數，來強制進行create操作，允許如果確實就修改（put-if-absent）的行為。當指定create時，如果文檔的id在索引集中已存在，那么索引操作失敗。

示例：

PUT twitter/_doc/1?op_type=create {"user" : "kimchy","post_date" : "2009-11-15T14:12:12","message" : "trying out Elasticsearch" } {"error": {.......},"status": 409 }

op_type=create也可以使用如下方式，二者效果是一樣的：

PUT twitter/_doc/1?op_type=create <=> PUT twitter/_create/1

自動生成ID

索引操作時如果不指定ID，將自動生成，并且會默認op_type是create，注意，請求方式是POST

POST twitter/_doc/ {"user" : "kimchy","post_date" : "2009-11-15T14:12:12","message" : "trying out Elasticsearch" } {"_index" : "twitter","_type" : "_doc","_id" : "A6y0umsBAkV3IICsYCLL","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 1,"_primary_term" : 1 }

Get API

get api根據id，從索引（集）中獲取JSON文檔，支持如下幾種方式

# 獲取文檔及元信息 GET {index}/_doc/{id}# 僅獲取文檔字段，不包含元信息 GET {index}/_source/{id}

以上兩種方式都支持字段過濾。

示例：

GET twitter/_doc/1 {"_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 2,"_seq_no" : 2,"_primary_term" : 1,"found" : true,"_source" : {"user" : "Jack2","post_date" : "2019-05-15T14:12:12","message" : "trying out Elasticsearch"} }

如果沒找到，返回：

{"_index" : "twitter","_type" : "_doc","_id" : "1","found" : false }

我們也可以發一個HEAD請求，來查詢文檔是否存在：

HEAD /twitter/_doc/0 200 - OK # 存在 404 - Not Found # 不存在

字段過濾

get操作默認會返回_source字段的所有內容一級文檔的元信息，除非你指定stored_fields參數，或者禁用_source字段：

GET twitter/_doc/1?_source=false {"_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 2,"_seq_no" : 2,"_primary_term" : 1,"found" : true }

如果，你只需要_source字段中某幾個字段，你可以使用如下兩個參數指定：

_source_includes 指定包含字段，字段間以,號分隔
_source_excludes指定丟棄字段，字段間以,號分隔
以上兩個參數可以通過&連接一起使用

GET twitter/_doc/2?_source_includes=user,message&_source_excludes=date

如果只是需要指定包含字段，可以簡化為_source：

GET twitter/_doc/1?_source=user,message {"_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 2,"_seq_no" : 2,"_primary_term" : 1,"found" : true,"_source" : {"message" : "trying out Elasticsearch","user" : "Jack2"} }

字段過濾可以節省網絡開銷。

Stored Fields

創建索引（集）facebook, 并定義mappings

put facebook {"mappings": {"properties": {"counter": {"type": "integer","store": false},"tags": {"type": "keyword","store": true}}} }

插入一個文檔：

PUT facebook/_doc/2 {"counter": 1,"tags": ["red"] }

檢索剛插入的文檔，通過stored_fields參數指定字段：

GET facebook/_doc/2?stored_fields=tags,counter {"_index" : "facebook","_type" : "_doc","_id" : "2","_version" : 1,"_seq_no" : 1,"_primary_term" : 1,"found" : true,"fields" : {"tags" : ["red"]} }

由于counter字段在mappings中store: false，當進行檢索時，將忽略該字段。

僅獲取_source內容

使用{index}/_source/{id}可以僅獲取文檔的_source字段內容：

GET twitter/_source/1 {"user" : "Jack2","post_date" : "2019-05-15T14:12:12","message" : "trying out Elasticsearch" }

也可以通過HEAD請求檢測文檔的_source是否存在。另外，如果在mapping中禁用了_source，存在的文檔也不存在_source

HEAD twitter/_source/1

Multi Get API

multi get 支持一次請求，查詢多個文檔，形式如下：

# 可以從多個索引中查詢，需要分別指定_index和_id GET /_mget# 指定索引，可以直接指定ids GET /{index}/_mget

示例1：

GET /_mget {"docs": [{"_index": "twitter","_id": "1"},{"_index": "facebook","_id": "2"}] }

說明：

查詢條件放在docs字段的列表中
可以基于index, id進行查詢。基于type查詢"_type" : "_doc" 在7.x中已廢棄，不需要傳_type條件
GET /_mget或者_mget都行（其他請求也是，/不影響）

返回結果放在docs字段的列表中

{"docs" : [{"_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 2,"_seq_no" : 2,"_primary_term" : 1,"found" : true,"_source" : {"user" : "Jack2","post_date" : "2019-05-15T14:12:12","message" : "trying out Elasticsearch"}},{"_index" : "facebook","_type" : "_doc","_id" : "2","_version" : 1,"_seq_no" : 1,"_primary_term" : 1,"found" : true,"_source" : {"counter" : 1,"tags" : ["red"]}}] }

示例2：

GET twitter/_mget {"docs": [{"_id": "1"},{"_id": "2"}] }

在上面這種情況，只根據id過濾時，可以簡寫如下：

GET twitter/_mget {"ids": ["1", "2"] // ids字段 }

字段過濾

可以通過_source參數指定返回的字段：

GET _mget {"docs": [{"_index": "twitter","_id": "1","_source": false},{"_index": "twitter","_id": "2","_source": ["user", "message"]},{"_index": "facebook","_id": "2","_source": {"include": ["counter"],"exclude": []}}] }

也可以通過stored_fields來指定。

Update API

update api允許基于腳本更新文檔，每次修改后，_version + 1，基本方式：

# 基于script更新 POST {index}/_update/{id} {"script": {...} }# 基于文檔更新，doc中的字段將會自動與目標文檔合并 POST {index}/_update/{id} {"doc": {...} }

腳本更新

首先我們插入一個文檔：

PUT facebook/_doc/3 {"counter": 1,"tags": ["red"] }

下面執行更新腳本：

script 1: 增加counter

POST facebook/_update/3 {"script": {"source": "ctx._source.counter += params.count","lang": "painless", // 使用painless函數"params": {"count": 4}} }

script 2: 添加tags元素

POST facebook/_update/3 {"script": {"source": "ctx._source.tags.add(params.tag)","lang": "painless","params": {"tag": "blue"}} }

script 3: 移除tags元素

POST facebook/_update/3 {"script": {"source": "if(ctx._source.tags.contains(params.tag)){ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag))}","lang": "painless","params": {"tag": "red"}} }

查看最終結果：

{"_index" : "facebook","_type" : "_doc","_id" : "3","_version" : 4,"_seq_no" : 5,"_primary_term" : 1,"found" : true,"_source" : {"counter" : 5,"tags" : ["blue"]} }

文檔更新

更新請求體中使用doc字段傳遞一個文檔，該文檔就會與目標文檔進行合并

POST facebook/_update/3 {"doc": {"name": "wahaha"} }

由于目標文檔不存在name字段，合并后將會新增name字段。

noop更新

如果更新沒有改變任何東西，將返回noop結果，_version不變。比如講上述文檔更新執行兩次，第二次將得到如下結果：

{"_index" : "facebook","_type" : "_doc","_id" : "3","_version" : 5,"result" : "noop","_shards" : {"total" : 0,"successful" : 0,"failed" : 0} }

當然你也可以禁止noop結果：

POST facebook/_update/3 {"doc": {"name": "wahaha"},"detect_noop": false // 禁止 }

再次執行同樣的更新，_version 將 +1

Upserts

存在則更新，不存在則創建：

POST facebook/_update/5 {"script": { // 更新腳本"source": "ctx._source.counter += params.count","lang": "painless","params": {"count": 4}},"upsert": { // 如果目標文檔不存在，以upsert中的內容創建新的文檔"counter": 1} }

scripted_upsert

如果希望腳本不論目標文檔是否存在都執行，可以指定scripted_upsert為true，這樣腳本將代替upsert字段執行初始化文檔。官網示例本地無法執行，暫不討論。

doc_as_upsert

適用于文檔更新的upsert操作：

POST facebook/_update/6 {"doc": {"name": "wahaha"},"doc_as_upsert": true }

請求參數

更新操作還支持查詢字符串參數，具體參見官網：<https://www.elastic.co/guide/en/elasticsearch/reference/7.2/docs-update.html#_parameters_2>

Update By Query API

_update_by_query適用于執行對索引（集）中全部文檔的更新，比如添加新的字段，或者其他mapping更改。該操作也可以指定條件，只對特定文檔進行更新。

POST {index}/_update_by_query

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/docs-update-by-query.html

Delete API

用于從索引中刪除文檔，用法如下：

DELETE {index}/_doc/{id}

比如，從twitter中刪除id為1點文檔：

DELETE twitter/_doc/1

返回響應：

{"_index" : "twitter","_type" : "_doc","_id" : "1","_version" : 3,"result" : "deleted","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 6,"_primary_term" : 1 }

Delete By Query API

該接口可以實現刪除一個索引中的所有文檔：

POST {index}/_delete_by_query

當然也支持指定篩選條件，實現部分刪除。
https://www.elastic.co/guide/en/elasticsearch/reference/7.2/docs-delete-by-query.html

Bulk API

bulk api允許在一次接口調用中，實現多個增刪改操作

POST /_bulk { "index" : { "_index" : "facebook", "_id" : "2" } } { "connter" : 2 } // index的source { "delete" : { "_index" : "twitter", "_id" : "2" } } { "create" : { "_index" : "twitter", "_id" : "7" } } { "name" : "seven" } // create的source { "update" : {"_id" : "1", "_index" : "test"} } { "doc" : {"field2" : "value2"} } // update的doc

說明：

index 和 create操作需要在緊接著的下一行提供source，以便進行添加文檔，或者更新
delete不需要source
update需要在下一行指定doc，upsert, script等細節

返回結果：

{"took" : 217,"errors" : true,"items" : [{"index" : { // 第一個index因為id已經存在，于是根據source更新了文檔的內容"_index" : "facebook","_type" : "_doc","_id" : "2","_version" : 4,"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 14,"_primary_term" : 1,"status" : 200}},{"delete" : { // 第二個delete沒找到目標文檔"_index" : "twitter","_type" : "_doc","_id" : "2","_version" : 1,"result" : "not_found","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 8,"_primary_term" : 1,"status" : 404}},{"create" : { // 第三個以給出的source成功創建文檔"_index" : "twitter","_type" : "_doc","_id" : "7","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 9,"_primary_term" : 1,"status" : 201}},{"update" : { // update指定的index不存在，返回404"_index" : "test","_type" : "_doc","_id" : "1","status" : 404,"error" : {"type" : "document_missing_exception","reason" : "[_doc][1]: document missing","index_uuid" : "k7zYEGp8ROuGWkfOprCKvQ","shard" : "0","index" : "test"}}}] }

Reindex API

reindex最基本的用法是將一個索引中的文檔拷貝到另一個文檔：

POST _reindex {"source": {"index": "{index1}"},"dest": {"index": "{index2}"} }

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/docs-reindex.html

并發控制

Elasticsearch是分布式的。創建，修改，刪除文檔后，新版的文檔必須復制到集群內的其他節點。Elasticsearch同時也是異步和并發的，意味著這些復制請求是并行發送的，不能保證到達目的地的順序。因此ES需要一種方式確保舊版本的文檔不會覆蓋較新的文檔。

對文檔的每次操作都會由主分片（primary shard）分配一個序號，每次操作文檔都會增大序號。ES通過序號判斷文檔的新舊，保證舊文檔不會覆蓋新文檔。

GET獲取文檔時，可以查看到序號：

{"_index" : "twitter","_type" : "_doc","_id" : "2","_version" : 1,"_seq_no" : 5, // 序號"_primary_term" : 1, // 主條目"found" : true,"_source" : {"user" : "Jack2","post_date" : "2019-05-15T14:12:12","message" : "trying out Elasticsearch"} }

因此，通過記下返回的_seq_no和_primary_term，可以確保你僅在檢索到文檔且未有其他任何對該文檔的更改時，才更改該文檔，比如：

DELETE twitter/_doc/2?if_seq_no=5&if_primary_term=1

如果在刪除前，發生了對該文檔的其他修改，刪除操作將會失敗：409響應，并提示版本沖突。

關于文檔讀寫

ES中的每個索引(集)都被分散到分片上，每個分片可以有多個副本。當進行文檔添加或移除時，副本必須保持同步。保持分片副本同步并提供讀取服務的過程，就是數據副本模型。
ES的數據副本模型基于主備模型。該模型中，某個副本作為主分片（primary shard），其他副本充當副本分片（replica shard）。主分片作為所有索引操作的主入口，負責驗證并確保它們是正確的，并將操作復制到其他副本。

基本寫模型

ES中的每個索引操作首先通過路由（routing）解析到一個副本組，這通常是基于文檔ID。副本組確定后，操作將在內部轉發到當前的主分片上。主分片負責驗證操作并轉發給其他副本。因為副本可以脫機，所以主分片不必復制給所有副本。事實上，es會維護一個接收操作的副本列表，稱之為同步副本，并由主節點維護。主分片必須保證將所有的操作復制到同步副本中每個副本。主分片遵循如下基本流程：

驗證操作，比如字段是否相符

本機執行操作，比如索引或刪除相關文檔，必要時也會拒絕，比如一個keyword值過長，以至于無法在Lucene中索引

轉發操作給當前同步副本組中的每個副本（并行操作）

一旦所有副本成功執行了操作并報告給主分片，主分片就會確認成功完成了客戶端的請求。

失敗處理

索引時可能會出錯，比如磁盤故障，節點丟失，或者配置錯誤。主分片需要對此進行處理。

如果主分片自身故障了，那么其所在的節點將通知主節點。索引操作將最多等待1分鐘，直到主節點將某個副本提升為新的的主分片。然后操作被轉發到新的主分片進行處理。

在主分片成功執行了操作，副本出錯的情況下（比如副本本身故障或者網絡問題），主分片將請求主節點將故障副本從同步副本中移除。

基本讀模型

在ES中，讀取可以是根據id的非常輕量級的查找，也可以是具有復雜聚合，占用cpu的搜索請求。主備模型的優點是所有分片副本保持一致，因此每個副本都能提供讀取請求。

處理當前客戶端請求的節點稱為協調節點（coordinating node），基本流程如下：

解析請求到相關的分片上。因為大多數搜索會被發送到一個或多個索引，因此需要從多個分片進行讀取，每個分片代表數據的不同子集。

從相關分片的副本組中選取一個副本，這可以是主分片，也可以是副本分片（默認情況下es在會輪流選擇）

發送讀取請求給選中的副本

合并結果并響應。如果是通過id查找，因為只有一個相關分片，所以這步會被跳過。

分片失敗

當某個分片響應讀取請求時，協調節點將請求發往同一副本組的另一個分片。多次失敗會導致沒有可用的分片。

為了確保快速響應，下列API在出現分片失敗時，會返回部分結果：

Search
Multi Search
Bulk
Multi Get

這時仍然是200響應，但是通過time_out和_shards字段可以知道出現了分片失敗。

總結

以上是生活随笔為你收集整理的Document API的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

API
document

上一篇：关于hexo更新到GitHub后博客内容
下一篇：融合的胜利——惠普连发SDS、闪存、超融