日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

ElasticSearch搜索语法进阶学习(搜索+聚合,过滤+聚合)

發布時間:2024/4/11 编程问答 19 豆豆
生活随笔 收集整理的這篇文章主要介紹了 ElasticSearch搜索语法进阶学习(搜索+聚合,过滤+聚合) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

ElasticSearch聚合+搜索語法學習


目錄

  • 搜索+聚合:統計指定品牌下每個顏色的銷量
  • global bucket:單個品牌與所有品牌銷量對比
  • 過濾+聚合:統計價格大于1200的電視平均價格
  • bucket filter:統計牌品最近一個月的平均價格
  • 排序:按每種顏色的平均銷售額降序排序
  • 顏色+品牌下鉆分析時按最深層metric進行排序
  • cardinality去重算法以及每月銷售品牌數量統計
  • cardinality算法之優化內存開銷以及HLL算法
  • ES數據參考上一篇:ElasticSearch聚合語法學習(bucket,metric,hitogram,date hitogram)


    1. 搜索+聚合:統計指定品牌下每個顏色的銷量

  • 實際上來說,我們之前學習的搜索相關的知識,完全可以和聚合組合起來使用
  • select count(*) from tvs.sales where brand like "%長%" group by price
  • es aggregation,scope,任何的聚合,都必須在搜索出來的結果數據中之行,搜索結果,就是聚合分析操作的scope
  • GET /tvs/sales/_search {"size": 0,"query": {"term": {"brand": {"value": "小米"}}},"aggs": {"group_by_color": {"terms": {"field": "color"}}} } {"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 0,"hits": []},"aggregations": {"group_by_color": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "綠色","doc_count": 1},{"key": "藍色","doc_count": 1}]}} }

    2. global bucket:單個品牌與所有品牌銷量對比

  • aggregation,scope,一個聚合操作,必須在query的搜索結果范圍內執行
  • GET /tvs/sales/_search {"size": 0, "query": {"term": {"brand": {"value": "長虹"}}},"aggs": {"single_brand_avg_price": {"avg": {"field": "price"}},"all": {"global": {},"aggs": {"all_brand_avg_price": {"avg": {"field": "price"}}}}} }
  • global:就是global bucket,就是將所有數據納入聚合的scope,而不管之前的query
  • 出來兩個結果,一個結果,是基于query搜索結果來聚合的; 一個結果,是對所有數據執行聚合的
  • {"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 3,"max_score": 0,"hits": []},"aggregations": {"all": {"doc_count": 8,"all_brand_avg_price": {"value": 2650}},"single_brand_avg_price": {"value": 1666.6666666666667}} }
  • single_brand_avg_price:就是針對query搜索結果,執行的,拿到的,就是長虹品牌的平均價格
  • all.all_brand_avg_price:拿到所有品牌的平均價格

  • 3. 過濾+聚合:統計價格大于1200的電視平均價格

    GET /tvs/sales/_search {"size": 0,"query": {"constant_score": {"filter": {"range": {"price": {"gte": 1200}}}}},"aggs": {"avg_price": {"avg": {"field": "price"}}} } {"took": 41,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 7,"max_score": 0,"hits": []},"aggregations": {"avg_price": {"value": 2885.714285714286}} }

    4. bucket filter:統計牌品最近一個月的平均價格

    GET /tvs/sales/_search {"size": 0,"query": {"term": {"brand": {"value": "長虹"}}},"aggs": {"recent_150d": {"filter": {"range": {"sold_date": {"gte": "now-150d"}}},"aggs": {"recent_150d_avg_price": {"avg": {"field": "price"}}}},"recent_140d": {"filter": {"range": {"sold_date": {"gte": "now-140d"}}},"aggs": {"recent_140d_avg_price": {"avg": {"field": "price"}}}},"recent_130d": {"filter": {"range": {"sold_date": {"gte": "now-130d"}}},"aggs": {"recent_130d_avg_price": {"avg": {"field": "price"}}}}} }
  • aggs.filter,針對的是聚合去做的

  • 如果放query里面的filter,是全局的,會對所有的數據都有影響

  • 但是,如果,比如說,你要統計,長虹電視,最近1個月的平均值; 最近3個月的平均值; 最近6個月的平均值

  • bucket filter:對不同的bucket下的aggs,進行filter


  • 5. 排序:按每種顏色的平均銷售額降序排序

  • 之前的話,排序,是按照每個bucket的doc_count降序來排的

  • 但是假如說,我們現在統計出來每個顏色的電視的銷售額,需要按照銷售額降序排序????

  • GET /tvs/sales/_search {"size": 0,"aggs": {"group_by_color": {"terms": {"field": "color"},"aggs": {"avg_price": {"avg": {"field": "price"}}}}} } {"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_color": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "紅色","doc_count": 4,"avg_price": {"value": 3250}},{"key": "綠色","doc_count": 2,"avg_price": {"value": 2100}},{"key": "藍色","doc_count": 2,"avg_price": {"value": 2000}}]}} } GET /tvs/sales/_search {"size": 0,"aggs": {"group_by_color": {"terms": {"field": "color","order": {"avg_price": "asc"}},"aggs": {"avg_price": {"avg": {"field": "price"}}}}} }

    6. 顏色+品牌下鉆分析時按最深層metric進行排序

    GET /tvs/sales/_search {"size": 0,"aggs": {"group_by_color": {"terms": {"field": "color"},"aggs": {"group_by_brand": {"terms": {"field": "brand","order": {"avg_price": "desc"}},"aggs": {"avg_price": {"avg": {"field": "price"}}}}}}} }

    7. cardinality去重算法以及每月銷售品牌數量統計

  • 去重,cartinality metric,對每個bucket中的指定的field進行去重,取去重后的count,類似于count(distcint)
  • GET /tvs/sales/_search {"size" : 0,"aggs" : {"months" : {"date_histogram": {"field": "sold_date","interval": "month"},"aggs": {"distinct_colors" : {"cardinality" : {"field" : "brand"}}}}} } {"took": 70,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_sold_date": {"buckets": [{"key_as_string": "2016-05-01T00:00:00.000Z","key": 1462060800000,"doc_count": 1,"distinct_brand_cnt": {"value": 1}},{"key_as_string": "2016-06-01T00:00:00.000Z","key": 1464739200000,"doc_count": 0,"distinct_brand_cnt": {"value": 0}},{"key_as_string": "2016-07-01T00:00:00.000Z","key": 1467331200000,"doc_count": 1,"distinct_brand_cnt": {"value": 1}},{"key_as_string": "2016-08-01T00:00:00.000Z","key": 1470009600000,"doc_count": 1,"distinct_brand_cnt": {"value": 1}},{"key_as_string": "2016-09-01T00:00:00.000Z","key": 1472688000000,"doc_count": 0,"distinct_brand_cnt": {"value": 0}},{"key_as_string": "2016-10-01T00:00:00.000Z","key": 1475280000000,"doc_count": 1,"distinct_brand_cnt": {"value": 1}},{"key_as_string": "2016-11-01T00:00:00.000Z","key": 1477958400000,"doc_count": 2,"distinct_brand_cnt": {"value": 1}},{"key_as_string": "2016-12-01T00:00:00.000Z","key": 1480550400000,"doc_count": 0,"distinct_brand_cnt": {"value": 0}},{"key_as_string": "2017-01-01T00:00:00.000Z","key": 1483228800000,"doc_count": 1,"distinct_brand_cnt": {"value": 1}},{"key_as_string": "2017-02-01T00:00:00.000Z","key": 1485907200000,"doc_count": 1,"distinct_brand_cnt": {"value": 1}}]}} }

    8. cardinality算法之優化內存開銷以及HLL算法

    1. cardinality解析
  • cardinality,count(distinct),5%的錯誤率,性能在100ms左右
  • precision_threshold優化準確率和內存開銷
  • GET /tvs/sales/_search {"size" : 0,"aggs" : {"distinct_brand" : {"cardinality" : {"field" : "brand","precision_threshold" : 100 }}} }
  • brand去重,如果brand的unique value,在100個以內,小米,長虹,三星,TCL,HTL…

  • 在多少個unique value以內,cardinality,幾乎保證100%準確

  • cardinality算法,會占用precision_threshold * 8 byte 內存消耗,100 * 8 = 800個字節,占用內存很小,而且unique value如果的確在值以內,那么可以確保100%準確

  • 100,數百萬的unique value,錯誤率在5%以內

  • precision_threshold,值設置的越大,占用內存越大,1000 * 8 = 8000 / 1000 = 8KB,可以確保更多unique value的場景下,100%的準確

  • field,去重,count,這時候,unique value,10000,precision_threshold=10000,10000 * 8 = 80000個byte,80KB

  • 2. HyperLogLog++ (HLL)算法性能優化
  • cardinality底層算法:HLL算法,HLL算法的性能

  • 會對所有的uqniue value取hash值,通過hash值近似去求distcint count,誤差

  • 默認情況下,發送一個cardinality請求的時候,會動態地對所有的field value,取hash值; 將取hash值的操作,前移到建立索引的時候

  • PUT /tvs/ {"mappings": {"sales": {"properties": {"brand": {"type": "text","fields": {"hash": {"type": "murmur3" }}}}}} } GET /tvs/sales/_search {"size" : 0,"aggs" : {"distinct_brand" : {"cardinality" : {"field" : "brand.hash","precision_threshold" : 100 }}} } 超強干貨來襲 云風專訪:近40年碼齡,通宵達旦的技術人生

    總結

    以上是生活随笔為你收集整理的ElasticSearch搜索语法进阶学习(搜索+聚合,过滤+聚合)的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。