日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elastic Certified Engineer复习记录-复习题详解篇-索引数据(2)

發布時間:2023/12/14 编程问答 38 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Elastic Certified Engineer复习记录-复习题详解篇-索引数据(2) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

MAPPINGS AND TEXT ANALYSIS

索引和文檔的分析(分詞)

GOAL: Model relational data

目標:規整帶關系的數據模型

REQUIRED SETUP:

初始化步驟
建議docker-compose文件:1e1k_base_cluster.yml

  • a running Elasticsearch cluster with at least one node and a Kibana instance,
  • 運行一個至少有1個節點的ES集群,以及1個kibana節點
  • the cluster has no index with name hamlet,
  • 保證這個集群里沒有叫hamlet的索引
  • the cluster has no template that applies to indices starting by `hamlet
  • 保證這個集群里沒有能匹配以hamlet開頭的索引模板
  • DELETE hamlet_* DELETE _template/hamlet_*

    第1題,對象(object)型數據

  • Create the index hamlet_1 with one primary shard and no replicas
  • 創建一個包含1分片0副本的索引hamlet_1
  • Add some documents to hamlet_1 by running the following command
  • 用下面的命令給hamlet_插入一些數據
  • Verify that the items of the relationship array cannot be searched independently - e.g., searching for a friend named Gertrude will return 1 hit
  • 校驗一下relationship字段數組里的元素不能被獨立搜索,比如搜索"name": "Gertrude"而且"type": "friend"的數據有一個返回
  • PUT hamlet_1/_doc/_bulk {"index":{"_index":"hamlet_1","_id":"C0"}} {"name":"HAMLET","relationship":[{"name":"HORATIO","type":"friend"},{"name":"GERTRUDE","type":"mother"}]} {"index":{"_index":"hamlet_1","_id":"C1"}} {"name":"KING CLAUDIUS","relationship":[{"name":"HAMLET","type":"nephew"}]}

    第1題,題解

  • 創建索引

    PUT hamlet_1 {"settings": {"number_of_shards": 1,"number_of_replicas": 0} }
  • 插數據,運行上面的命令,過程略。數據結構:GET hamlet_1

    {"hamlet_1" : {"aliases" : { },"mappings" : {"properties" : {"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"relationship" : {"properties" : {"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"type" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}}}},"settings" : {"index" : {"creation_date" : "1606270886689","number_of_shards" : "1","number_of_replicas" : "0","uuid" : "BaWwDy_eSaKPaynt8rWW3g","version" : {"created" : "7020199"},"provided_name" : "hamlet_1"}}} }
  • 校驗數據

    POST hamlet_1/_search {"query": {"bool": {"must": [{"match": {"relationship.type": "friend"}},{"match": {"relationship.name": "Gertrude"}}]}} }
    • 返回值
    {"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.2199391,"hits" : [{"_index" : "hamlet_1","_type" : "_doc","_id" : "C0","_score" : 1.2199391,"_source" : {"name" : "HAMLET","relationship" : [{"name" : "HORATIO","type" : "friend"},{"name" : "GERTRUDE","type" : "mother"}]}}]} }
  • 第1題,題解說明

    • 這題主要考察object型的數據,對ES來說所有的字段都支持數組,所以relationship這個數組里可以保存多個object型的數據。
      • 在沒指定數據結構的時候,ES會嘗試按數據的結構匹配合理的索引結構,像relationship這種帶嵌套結構的數據會默認被解析成object型的數據
      • object型的數據是一個類似 map 結構的數據,可以通過里面的key進行檢索,但是它和nested型數據的區別在于,列表中的所有對象會被當作一個整體來搜索,而nested型數據的每個對象中的字段可以分別進行搜索
    • 參考鏈接
    • 頁面路徑:Mapping =》 Field datatypes =》 Object

    第2題,嵌套(nested)型數據

  • Create the index hamlet_2 with one primary shard and no replicas
  • 創建一個含有1分片0副本的索引hamlet_2
  • Define a mapping for the default type “_doc” of hamlet_2, so that the inner objects of the relationship field
  • 給hamlet_2的type是默認的"_doc",同時它的字段需要滿足以下條件
  • can be searched independently,
  • 字段可以被獨立搜索
  • have only unanalyzed fields
  • 只有沒分詞的字段
  • Reindex hamlet_1 to hamlet_2
  • 把hamlet_1 reindex 到 hamlet_2里面
  • Verify that the items of the relationship array can now be searched independently - e.g., searching for a friend named Gertrude will return no hits
  • 校驗一下relationship數組里的元素可以被獨立搜索,比如,搜索"type": "friend" 而且 "name":"Gertrude"的數據沒有返回
  • 第2題,題解

  • 創建索引PUT hamlet_2 {"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"relationship": {"type": "nested"}}} }
  • reindex
  • POST _reindex {"source": {"index": "hamlet_1"},"dest": {"index": "hamlet_2"} }
  • 校驗數據
  • 直接請求

    POST hamlet_2/_search {"query": {"bool": {"must": [{"match": {"relationship.type": "friend"}},{"match": {"relationship.name": "Gertrude"}}]}} }
    • 返回值
    {"took" : 7,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]} }
  • 嵌套檢索

    POST hamlet_2/_search {"query": {"nested": {"path": "relationship","query": {"bool": {"must": [{"match": {"relationship.type": "friend"}},{"match": {"relationship.name": "Gertrude"}}]}}}} }
    • 返回值
    {"took" : 178,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]} }
  • 第2題,題解說明

    • 這題主要考察嵌套(nested)類型數據,它和對象(object)型數據的區別在于nested型數據可以通過指定路徑(path)的方式對指定層/位置的數據進行分別的檢索
    • 參考鏈接-nested-datatype
    • 頁面路徑:Mapping =》 Field datatypes =》 Nested

    第3題,父子文檔(parent-join)

  • Add more documents to hamlet_2 by running the following command
  • 用下面命令給hamlet_2多塞點數據
  • POST _bulk {"index":{"_index":"hamlet_2", "_id":"LO"}} {"line_number":"1.4.1","speaker":"HAMLET","text_entry":"The air bites shrewdly; it is very cold."} {"index":{"_index":"hamlet_2","_id":"L1"}} {"line_number":"1.4.2","speaker":"HORATIO","text_entry":"It is a nipping and an eager air."} {"index":{"_index":"hamlet_2","_id":"L2"}} {"line_number":"1.4.3","speaker":"HAMLET","text_entry":"What hour now?"}
  • Create the index hamlet_3 with only one primary shard and no replicas
  • 創建一個1分片0副本的索引hamlet_3
  • Copy the mapping of hamlet_2 into hamlet_3, but also add a join field to define a relation between a character (the parent) and a line (the child). The name of such field is “character_or_line”
  • 把hamlet_2的索引結構拷貝到hamlet_3里,同時添加一個名叫character_or_line的join字段來描述character(父文檔)和line(子文檔)的關系,
  • Reindex hamlet_2 to hamlet_3
  • 把hamlet_2 reindex 到 hamlet_3里面
  • Create a script named init_lines and save it into the cluster state. The script:
  • has a parameter named characterId,
  • adds the field character_or_line to the document,
  • sets the value of character_or_line.name to “line” ,
  • sets the value of character_or_line.parent to the value of the characterId parameter
  • Update the document with id C0 (i.e., the character document of Hamlet) by adding the field character_or_line and setting its character_or_line.name value to “character”
  • Update the documents in hamlet_3 that have “HAMLET” as a speaker, by running the init_lines script with characterId set to “C0”
  • 第3題,題解

  • 添加數據,略。
  • 創建索引PUT hamlet_3 {"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"properties": {"character_or_line": {"type": "join","relations": {"character": "line"}}}} }
  • reindexPOST _reindex {"source": {"index": "hamlet_2"},"dest": {"index": "hamlet_3"} }
  • 創建scriptPUT _ingest/pipeline/character_update_pipeline {"description": "set the 'character_or_linne', 'character_or_line.name', 'character_or_line.parent'","processors": [{"script": {"lang": "painless","source": """ctx.character_or_line = new HashMap(); ctx.character_or_line.name = "line";ctx.character_or_line.parent = params.characterId;""","params": {"characterId": "C0"}}}] }
  • (由于join field需要routing配置)添加新數據POST hamlet_3/_doc/C2?routing=C0 {"line_number": "1.2.1","speaker": "KING CLAUDIUS","text_entry": "Though yet of Hamlet our dear brothers death" }
  • 套用剛才的script定點更新POST hamlet_3/_update_by_query?routing=C0&pipeline=character_update_pipeline {"query":{"term":{"_id":"C2"}} }
  • 這里如果不加routing的設置直接進行更新,可能會報這個錯:大意是對于父子關聯的字段,routing是必須存在的。{"took": 10,"timed_out": false,"total": 1,"updated": 0,"deleted": 0,"batches": 1,"version_conflicts": 0,"noops": 0,"retries": {"bulk": 0,"search": 0},"throttled_millis": 0,"requests_per_second": -1,"throttled_until_millis": 0,"failures": [{"index": "hamlet_3","type": "_doc","id": "C2","cause": {"type": "mapper_parsing_exception","reason": "failed to parse","caused_by": {"type": "illegal_argument_exception","reason": "[routing] is missing for join field [character_or_line]"}},"status": 400}] }
  • 校驗數據:GET hamlet_3/_doc/C2
    • 返回值
    {"_index" : "hamlet_3","_type" : "_doc","_id" : "C2","_version" : 4,"_seq_no" : 5,"_primary_term" : 1,"_routing" : "C0","found" : true,"_source" : {"character_or_line" : {"parent" : "C0","name" : "line"},"line_number" : "1.2.1","text_entry" : "Though yet of Hamlet our dear brothers death","speaker" : "KING CLAUDIUS"} }
  • 第3題,題解說明

    • 這題主要考察的是父子關聯數據(parent join),reindex和_update_by_query
      • 關聯數據可以代替部分關系型數據庫的聯表查詢,但是畢竟是文檔型數據存儲,ES這部分的處理做的有些差強人意。
      • 在校驗結果的部分主要關注的是原始文檔里不存在character_or_line和_routing字段,在處理完之后會添上
      • reindex和_update_by_query其他章節已經講過,這里略。
    • 參考鏈接
    • 頁面路徑:Mapping =》 Field datatypes =》 Join

    第3題,拓展

    @老楊 還提供了另一種題解方式,但是會存在一些問題,比如子文檔需要指定routing,但是用 script 做 _update_by_query 的時候又不能直接更新這個屬性。

  • 創建scriptPOST _scripts/character_update_script {"script": {"lang": "painless","source": """Map map = new HashMap();map.name = "line";map.parent = params.characterId;ctx._source.character_or_line = map;"""} }
  • 創建指定routing用的pipelinePUT _ingest/pipeline/set_routing {"description": "assign the routing attribute for doc","processors": [{"script": {"lang": "painless","source": "ctx._routing = 'C0'"}}] }
  • 對文檔進行定點更新POST hamlet_3/_update_by_query?pipeline=set_routing {"query":{"term":{"_id":"C2"}},"script": {"id": "character_update_script","params": {"characterId": "C0"}} }
  • 校驗數據同上,略。
  • 總結

    以上是生活随笔為你收集整理的Elastic Certified Engineer复习记录-复习题详解篇-索引数据(2)的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。