Elasticsearch聚合深入详解——对比Mysql实现
聚合認知前提
桶(Buckets)——滿足特定條件的文檔的集合?
指標(Metrics)——對桶內的文檔進行統計計算
SELECT COUNT(color)?
FROM table?
GROUP BY color
COUNT(color) 相當于指標。?
GROUP BY color 相當于桶。
一、聚合起步
1、創建索引
1.1 創建索引DSL實現
put cars POST /cars/transactions/_bulk { "index": {}} { "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" } { "index": {}} { "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" } { "index": {}} { "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" } { "index": {}} { "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }1.2 創建mysql庫表sql實現
CREATE TABLE `cars` (`id` int(11) NOT NULL,`price` int(11) DEFAULT NULL,`color` varchar(255) DEFAULT NULL,`make` varchar(255) DEFAULT NULL,`sold` date DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;2、統計不同顏色車的數目
2.1 統計不同顏色車的DSL實現
GET /cars/transactions/_search {"size":0,"aggs":{"popular_colors" : {"terms":{"field": "color.keyword"}}} }返回結果:
lve
2.2 統計不同顏色的mysql實現
select color, count(color) as cnt from cars group by color order by cnt desc;返回結果:
red 4 green 2 blue 23、統計不同顏色車的平均價格
3.1 統計不同顏色車的平均價格DSL實現:
GET /cars/transactions/_search {"size":0,"aggs":{"colors" : {"terms":{"field": "color.keyword"},"aggs":{"avg_price":{"avg": {"field": "price"}}}}} }返回聚合結果:
lve
3.2 統計不同顏色車的平均價格sql實現:
select color, count(color) as cnt, avg(price) as avg_price from cars group by color order by cnt desc;color cnt avg_price red 4 32500.0000 green 2 21000.0000 blue 2 20000.00004、每種顏色汽車制造商的分布
4.1 統計每種顏色汽車制造商的分布dsl實現
GET /cars/transactions/_search {"size":0,"aggs":{"colors" : {"terms":{"field": "color.keyword"},"aggs":{"make":{"terms":{"field": "make.keyword"}}}}} }返回結果:
4.2 統計每種顏色汽車制造商的分布sql實現
說明:和dsl的實現不嚴格對應
select color, make from cars order by color;color make blue toyota blue ford green ford green toyota red bmw red honda red honda red honda5、統計每個制造商的最低價格、最高價格
5.1 統計每個制造商的最低、最高價格的DSL實現
GET /cars/transactions/_search {"size":0,"aggs":{"make_class" : {"terms":{"field": "make.keyword"},"aggs":{"min_price":{"min":{"field": "price"}},"max_price":{"max":{"field": "price"}}}}} }聚合結果:
5.2 統計每個制造商的最低、最高價格的sql實現
select make, min(price) as min_price, max(price) as max_price from cars group by make;make min_price max_price bmw 80000 80000 ford 25000 30000 honda 10000 20000 toyota 12000 15000二、聚合進階
1、條形圖聚合
1.1 分段統計每個區間的汽車銷售價格總和
GET /cars/transactions/_search {"size":0,"aggs":{"price" : {"histogram":{"field": "price","interval": 20000},"aggs":{"revenue":{"sum":{"field": "price"}}}}} }汽車銷售價格區間:定義為20000;?
分段統計price和用sum統計。
1.2 多維度度量不同制造商的汽車指標
GET /cars/transactions/_search {"size" : 0,"aggs": {"makes": {"terms": {"field": "make.keyword","size": 10},"aggs": {"stats": {"extended_stats": {"field": "price"}}}}} }輸出截取片段:
{"key": "ford","doc_count": 2,"stats": {"count": 2,"min": 25000,"max": 30000,"avg": 27500,"sum": 55000,"sum_of_squares": 1525000000,"variance": 6250000,"std_deviation": 2500,"std_deviation_bounds": {"upper": 32500,"lower": 22500}}}2、按時間統計聚合
2.1 按月份統計制造商汽車銷量dsl實現
GET /cars/transactions/_search {"size" : 0,"aggs": {"sales":{"date_histogram":{"field":"sold","interval":"month","format":"yyyy-MM-dd"}}} }返回結果:
2.2 按月份統計制造商汽車銷量sql實現
SELECT make, count(make) as cnt, CONCAT(YEAR(sold),',',MONTH(sold)) AS data_time FROM `cars` GROUP BY YEAR(sold) DESC,MONTH(sold)查詢結果如下: make cnt data_time bmw 1 2014,1 ford 1 2014,2 ford 1 2014,5 toyota 1 2014,7 toyota 1 2014,8 honda 1 2014,10 honda 2 2014,112.3 包含12月份的處理DSL實現
以上2.1 中沒有12月份的統計結果顯示。
GET /cars/transactions/_search {"size" : 0,"aggs": {"sales":{"date_histogram":{"field":"sold","interval":"month","format":"yyyy-MM-dd","min_doc_count": 0,"extended_bounds":{"min":"2014-01-01","max":"2014-12-31"}}}} }2.4 以季度為單位統計DSL實現
GET /cars/transactions/_search {"size" : 0,"aggs": {"sales":{"date_histogram":{"field":"sold","interval":"quarter","format":"yyyy-MM-dd","min_doc_count": 0,"extended_bounds":{"min":"2014-01-01","max":"2014-12-31"}},"aggs":{"per_make_sum":{"terms":{"field": "make.keyword"},"aggs":{"sum_price":{"sum":{ "field": "price"}}}},"top_sum": {"sum": {"field":"price"}}}}}}2.5 基于搜索的(范圍限定)聚合操作
2.5.1 基礎查詢聚合
GET /cars/transactions/_search {"query" : {"match" : {"make.keyword" : "ford"}},"aggs" : {"colors" : {"terms" : {"field" : "color.keyword"}}} }對應的sql實現:
select make, color from cars where make = "ford";結果返回如下: make color ford green ford blue三、過濾聚合
1. 過濾操作
統計全部汽車的平均價錢以及單品平均價錢;
GET /cars/transactions/_search {"size" : 0,"query" : {"match" : {"make.keyword" : "ford"}},"aggs" : {"single_avg_price": {"avg" : { "field" : "price" }},"all": {"global" : {},"aggs" : {"avg_price": {"avg" : { "field" : "price" }}}}} }等價于:
select make, color, avg(price) from cars where make = "ford" ; select avg(price) from cars;2、范圍限定過濾(過濾桶)
我們可以指定一個過濾桶,當文檔滿足過濾桶的條件時,我們將其加入到桶內。
GET /cars/transactions/_search {"size" : 0,"query":{"match": {"make": "ford"}},"aggs":{"recent_sales": {"filter": {"range": {"sold": {"from": "now-100M"}}},"aggs": {"average_price":{"avg": {"field": "price"}}}}} }mysql的實現如下:
select *, avg(price) from cars where period_diff(date_format(now() , '%Y%m') , date_format(sold, '%Y%m')) > 30 and make = "ford";mysql查詢結果如下: id price color make sold avg 3 30000 green ford 2014-05-18 27500.00003、后過濾器
只過濾搜索結果,不過濾聚合結果——post_filter實現
GET /cars/transactions/_search {"query": {"match": {"make": "ford"}},"post_filter": {"term" : {"color.keyword" : "green"}},"aggs" : {"all_colors": {"terms" : { "field" : "color.keyword" }}} }post_filter 會過濾搜索結果,只展示綠色 ford 汽車。這在查詢執行過 后 發生,所以聚合不受影響。
小結?
選擇合適類型的過濾(如:搜索命中、聚合或兩者兼有)通常和我們期望如何表現用戶交互有關。選擇合適的過濾器(或組合)取決于我們期望如何將結果呈現給用戶。
- 在 filter 過濾中的 non-scoring 查詢,同時影響搜索結果和聚合結果。
- filter 桶影響聚合。
- post_filter 只影響搜索結果。
四、多桶排序
4.1 內置排序
GET /cars/transactions/_search {"size" : 0,"aggs" : {"colors" : {"terms" : {"field" : "color.keyword","order": {"_count" : "asc"}}}} }4.2 按照度量排序
以下是按照汽車平均售價的升序進行排序。?
過濾條件:汽車顏色;?
聚合條件:平均價格;?
排序條件:汽車的平均價格升序。
多條件聚合后排序如下所示:
GET /cars/transactions/_search {"size" : 0,"aggs" : {"colors" : {"terms" : {"field" : "color.keyword","order": {"stats.variance" : "asc"}},"aggs": {"stats": {"extended_stats": {"field": "price"}}}}} }4.3 基于“深度”的度量排序
太復雜,不推薦!
五、近似聚合
cardinality的含義是“基數”;
5.1 統計去重后的數量
GET /cars/transactions/_search {"size" : 0,"aggs" : {"distinct_colors" : {"cardinality" : {"field" : "color.keyword"}}} }類似于:
SELECT COUNT(DISTINCT color) FROM cars;以下:?
以月為周期統計;
六、doc values解讀
在 Elasticsearch 中,doc values 就是一種列式存儲結構,默認情況下每個字段的 doc values 都是激活的,doc values 是在索引時創建的,當字段索引時,Elasticsearch 為了能夠快速檢索,會把字段的值加入倒排索引中,同時它也會存儲該字段的 doc values。?
Elasticsearch 中的 doc vaules 常被應用到以下場景:
因為文檔值被序列化到磁盤,我們可以依靠操作系統的幫助來快速訪問。當 working set 遠小于節點的可用內存,系統會自動將所有的文檔值保存在內存中,使得其讀寫十分高速;
當其遠大于可用內存,操作系統會自動把 doc values 加載到系統的頁緩存中,從而避免了 jvm 堆內存溢出異常。
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的Elasticsearch聚合深入详解——对比Mysql实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: match_phrase搜不出来,怎么办
- 下一篇: linux cmake编译源码,linu