當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elastic Certified Engineer复习记录-复习题详解篇-索引数据（2）

發布時間：2023/12/14 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elastic Certified Engineer复习记录-复习题详解篇-索引数据（2）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

MAPPINGS AND TEXT ANALYSIS

索引和文檔的分析（分詞）

GOAL: Model relational data

目標：規整帶關系的數據模型

REQUIRED SETUP:

初始化步驟
建議docker-compose文件：1e1k_base_cluster.yml

a running Elasticsearch cluster with at least one node and a Kibana instance,

運行一個至少有1個節點的ES集群，以及1個kibana節點

the cluster has no index with name hamlet,

保證這個集群里沒有叫hamlet的索引

the cluster has no template that applies to indices starting by `hamlet

保證這個集群里沒有能匹配以hamlet開頭的索引模板

DELETE hamlet_* DELETE _template/hamlet_*

第1題，對象（object）型數據

Create the index hamlet_1 with one primary shard and no replicas

創建一個包含1分片0副本的索引hamlet_1

Add some documents to hamlet_1 by running the following command

用下面的命令給hamlet_插入一些數據

Verify that the items of the relationship array cannot be searched independently - e.g., searching for a friend named Gertrude will return 1 hit

校驗一下relationship字段數組里的元素不能被獨立搜索，比如搜索"name": "Gertrude"而且"type": "friend"的數據有一個返回

PUT hamlet_1/_doc/_bulk {"index":{"_index":"hamlet_1","_id":"C0"}} {"name":"HAMLET","relationship":[{"name":"HORATIO","type":"friend"},{"name":"GERTRUDE","type":"mother"}]} {"index":{"_index":"hamlet_1","_id":"C1"}} {"name":"KING CLAUDIUS","relationship":[{"name":"HAMLET","type":"nephew"}]}

第1題，題解

創建索引

PUT hamlet_1 {"settings": {"number_of_shards": 1,"number_of_replicas": 0} }

插數據，運行上面的命令，過程略。數據結構：GET hamlet_1

{"hamlet_1" : {"aliases" : { },"mappings" : {"properties" : {"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"relationship" : {"properties" : {"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"type" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}}}},"settings" : {"index" : {"creation_date" : "1606270886689","number_of_shards" : "1","number_of_replicas" : "0","uuid" : "BaWwDy_eSaKPaynt8rWW3g","version" : {"created" : "7020199"},"provided_name" : "hamlet_1"}}} }

校驗數據

POST hamlet_1/_search {"query": {"bool": {"must": [{"match": {"relationship.type": "friend"}},{"match": {"relationship.name": "Gertrude"}}]}} }

返回值

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.2199391,"hits" : [{"_index" : "hamlet_1","_type" : "_doc","_id" : "C0","_score" : 1.2199391,"_source" : {"name" : "HAMLET","relationship" : [{"name" : "HORATIO","type" : "friend"},{"name" : "GERTRUDE","type" : "mother"}]}}]} }

第1題，題解說明

這題主要考察object型的數據，對ES來說所有的字段都支持數組，所以relationship這個數組里可以保存多個object型的數據。
- 在沒指定數據結構的時候，ES會嘗試按數據的結構匹配合理的索引結構，像relationship這種帶嵌套結構的數據會默認被解析成object型的數據
- object型的數據是一個類似 map 結構的數據，可以通過里面的key進行檢索，但是它和nested型數據的區別在于，列表中的所有對象會被當作一個整體來搜索，而nested型數據的每個對象中的字段可以分別進行搜索
參考鏈接
頁面路徑：Mapping =》 Field datatypes =》 Object

第2題，嵌套（nested）型數據

Create the index hamlet_2 with one primary shard and no replicas

創建一個含有1分片0副本的索引hamlet_2

Define a mapping for the default type “_doc” of hamlet_2, so that the inner objects of the relationship field

給hamlet_2的type是默認的"_doc"，同時它的字段需要滿足以下條件

can be searched independently,

字段可以被獨立搜索

have only unanalyzed fields

只有沒分詞的字段

Reindex hamlet_1 to hamlet_2

把hamlet_1 reindex 到 hamlet_2里面

Verify that the items of the relationship array can now be searched independently - e.g., searching for a friend named Gertrude will return no hits

校驗一下relationship數組里的元素可以被獨立搜索，比如，搜索"type": "friend" 而且 "name":"Gertrude"的數據沒有返回

第2題，題解

創建索引PUT hamlet_2 {"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"relationship": {"type": "nested"}}} }

reindex

POST _reindex {"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"} }

校驗數據

直接請求

POST hamlet_2/_search {"query": {"bool": {"must": [{"match": {"relationship.type": "friend"}},{"match": {"relationship.name": "Gertrude"}}]}} }

返回值

{"took" : 7,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]} }

嵌套檢索

POST hamlet_2/_search {"query": {"nested": {"path": "relationship","query": {"bool": {"must": [{"match": {"relationship.type": "friend"}},{"match": {"relationship.name": "Gertrude"}}]}}}} }

返回值

{"took" : 178,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]} }

第2題，題解說明

這題主要考察嵌套（nested）類型數據，它和對象（object）型數據的區別在于nested型數據可以通過指定路徑（path）的方式對指定層/位置的數據進行分別的檢索
參考鏈接-nested-datatype
頁面路徑：Mapping =》 Field datatypes =》 Nested

第3題，父子文檔（parent-join)

Add more documents to hamlet_2 by running the following command

用下面命令給hamlet_2多塞點數據

POST _bulk {"index":{"_index":"hamlet_2", "_id":"LO"}} {"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."} {"index":{"_index":"hamlet_2","_id":"L1"}} {"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a nipping and an eager air."} {"index":{"_index":"hamlet_2","_id":"L2"}} {"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour now?"}

Create the index hamlet_3 with only one primary shard and no replicas

創建一個1分片0副本的索引hamlet_3

Copy the mapping of hamlet_2 into hamlet_3, but also add a join field to define a relation between a character (the parent) and a line (the child). The name of such field is “character_or_line”

把hamlet_2的索引結構拷貝到hamlet_3里，同時添加一個名叫character_or_line的join字段來描述character（父文檔）和line（子文檔）的關系，

Reindex hamlet_2 to hamlet_3

把hamlet_2 reindex 到 hamlet_3里面

Create a script named init_lines and save it into the cluster state. The script:

has a parameter named characterId,

adds the field character_or_line to the document,

sets the value of character_or_line.name to “line” ,

sets the value of character_or_line.parent to the value of the characterId parameter

Update the document with id C0 (i.e., the character document of Hamlet) by adding the field character_or_line and setting its character_or_line.name value to “character”

Update the documents in hamlet_3 that have “HAMLET” as a speaker, by running the init_lines script with characterId set to “C0”

第3題，題解

添加數據，略。

創建索引PUT hamlet_3 {"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"character_or_line": {"type": "join","relations": {"character": "line"}}}} }

reindexPOST _reindex {"source": {"index": "hamlet_2"},"dest": {"index": "hamlet_3"} }

創建scriptPUT _ingest/pipeline/character_update_pipeline {"description": "set the 'character_or_linne', 'character_or_line.name', 'character_or_line.parent'","processors": [{"script": {"lang": "painless","source": """ctx.character_or_line = new HashMap(); ctx.character_or_line.name = "line";ctx.character_or_line.parent = params.characterId;""","params": {"characterId": "C0"}}}] }

（由于join field需要routing配置）添加新數據POST hamlet_3/_doc/C2?routing=C0 {"line_number": "1.2.1","speaker": "KING CLAUDIUS","text_entry": "Though yet of Hamlet our dear brothers death" }

套用剛才的script定點更新POST hamlet_3/_update_by_query?routing=C0&pipeline=character_update_pipeline {"query":{"term":{"_id":"C2"}} }

這里如果不加routing的設置直接進行更新，可能會報這個錯：大意是對于父子關聯的字段，routing是必須存在的。{"took": 10,"timed_out": false,"total": 1,"updated": 0,"deleted": 0,"batches": 1,"version_conflicts": 0,"noops": 0,"retries": {"bulk": 0,"search": 0},"throttled_millis": 0,"requests_per_second": -1,"throttled_until_millis": 0,"failures": [{"index": "hamlet_3","type": "_doc","id": "C2","cause": {"type": "mapper_parsing_exception","reason": "failed to parse","caused_by": {"type": "illegal_argument_exception","reason": "[routing] is missing for join field [character_or_line]"}},"status": 400}] }

校驗數據：GET hamlet_3/_doc/C2

返回值

{"_index" : "hamlet_3","_type" : "_doc","_id" : "C2","_version" : 4,"_seq_no" : 5,"_primary_term" : 1,"_routing" : "C0","found" : true,"_source" : {"character_or_line" : {"parent" : "C0","name" : "line"},"line_number" : "1.2.1","text_entry" : "Though yet of Hamlet our dear brothers death","speaker" : "KING CLAUDIUS"} }

第3題，題解說明

這題主要考察的是父子關聯數據（parent join），reindex和_update_by_query
- 關聯數據可以代替部分關系型數據庫的聯表查詢，但是畢竟是文檔型數據存儲，ES這部分的處理做的有些差強人意。
- 在校驗結果的部分主要關注的是原始文檔里不存在character_or_line和_routing字段，在處理完之后會添上
- reindex和_update_by_query其他章節已經講過，這里略。
參考鏈接
頁面路徑：Mapping =》 Field datatypes =》 Join

第3題，拓展

@老楊還提供了另一種題解方式，但是會存在一些問題，比如子文檔需要指定routing，但是用 script 做 _update_by_query 的時候又不能直接更新這個屬性。

創建scriptPOST _scripts/character_update_script {"script": {"lang": "painless","source": """Map map = new HashMap();map.name = "line";map.parent = params.characterId;ctx._source.character_or_line = map;"""} }

創建指定routing用的pipelinePUT _ingest/pipeline/set_routing {"description": "assign the routing attribute for doc","processors": [{"script": {"lang": "painless","source": "ctx._routing = 'C0'"}}] }

對文檔進行定點更新POST hamlet_3/_update_by_query?pipeline=set_routing {"query":{"term":{"_id":"C2"}},"script": {"id": "character_update_script","params": {"characterId": "C0"}} }

校驗數據同上，略。

總結

以上是生活随笔為你收集整理的Elastic Certified Engineer复习记录-复习题详解篇-索引数据（2）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： c语言看门狗指令pic,PIC指令介绍
下一篇： UTF8各国语言分段表