生活随笔
收集整理的這篇文章主要介紹了
Hive窗口函数
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
一、窗口函數(shù)
OVER()指定分析函數(shù)工作的數(shù)據(jù)窗口大小,這個(gè)數(shù)據(jù)窗口大小可能會(huì)隨著行的變而變化
-
CURRENT ROW:當(dāng)前行
-
n PRECEDING:往前n行數(shù)據(jù)
-
n FOLLOWING:往后n行數(shù)據(jù)
-
UNBOUNDED:起點(diǎn),UNBOUNDED PRECEDING 表示從前面的起點(diǎn), UNBOUNDED FOLLOWING表示到后面的終點(diǎn)
-
LAG(col,n):往前第n行數(shù)據(jù)
-
LEAD(col,n):往后第n行數(shù)據(jù)
-
NTILE(n):把有序分區(qū)中的行分發(fā)到指定數(shù)據(jù)的組中,各個(gè)組有編號(hào),編號(hào)從1開(kāi)始,對(duì)于每一行,NTILE返回此行所屬的組的編號(hào)。注意:n必須為int類(lèi)型。
select name
,orderdate
,cost
,
sum(cost
) over() as sample1
,
sum(cost
) over(partition by name
) as sample2
,
sum(cost
) over(partition by name
order by orderdate
) as sample3
,
sum(cost
) over(partition by name
order by orderdate
rows between UNBOUNDED PRECEDING and current row ) as sample4
,
sum(cost
) over(partition by name
order by orderdate
rows between 1 PRECEDING and current row) as sample5
,
sum(cost
) over(partition by name
order by orderdate
rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6
,
sum(cost
) over(partition by name
order by orderdate
rows between current row and UNBOUNDED FOLLOWING ) as sample7
from business
;select name
,orderdate
,cost
,
lag
(orderdate
,1,'1900-01-01') over(partition by name
order by orderdate
) as time1
, lag
(orderdate
,2) over (partition by name
order by orderdate
) as time2
from business
;查詢(xún)前
20%時(shí)間的訂單信息
select * from (select name
,orderdate
,cost
, ntile
(5) over(order by orderdate
) sorted
from business
) t
where sorted
= 1;
二、案例
- 統(tǒng)計(jì)截止到當(dāng)前時(shí)間段的店鋪累計(jì)銷(xiāo)售金額
select shop_id
, stat_date
, ordamt
, sum(ordamt
) over(partition by shop_id
order by stat_date
)
from rt_data
where dt
= '2015-01-11' and shop_id
= 10026;10026 201501120030 5170 5170
10026 201501120100 5669 10839
10026 201501120130 2396 13235
10026 201501120200 1498 14733
10026 201501120230 1997 16730
10026 201501120300 1188 17918
10026 201501120330 598 18516
10026 201501120400 479 18995
10026 201501120430 1587 20582
10026 201501120530 799 21381
- 統(tǒng)計(jì)每個(gè)時(shí)間段的銷(xiāo)售占比
select shop_id
, stat_date
, ordamt
, ordamt
/ sum(ordamt
) over(partition by shop_id
)
from rt_data
where dt
= '2015-01-11' and shop_id
= 10026
order by stat_date
;10026 201501120900 5406 0.014227
10026 201501120930 13098 0.034471
10026 201501121000 15679 0.041263
10026 201501121030 12134 0.031934
10026 201501121100 26259 0.069107
10026 201501121130 16516 0.043466
10026 201501121200 20127 0.052969
-
找出2點(diǎn)的銷(xiāo)售金額及前半小時(shí)的銷(xiāo)售金額和后1個(gè)小時(shí)的銷(xiāo)售金額
lag(field, N)是取前N行的值,lead(field, N)是取后N行的值。
select * from ( select shop_id
, stat_date
,ordamt
, lag
(stat_date
,1) over(partition by shop_id
order by stat_date
), lag
(ordamt
,1) over(partition by shop_id
order by stat_date
), lead
(stat_date
,2) over(partition by shop_id
order by stat_date
),lead
(ordamt
,2) over(partition by shop_id
order by stat_date
) from rt_data
where dt
= '2015-01-11' and shop_id
= 10026
) t
where stat_date
= 201501120200;10026 201501120200 1498 201501120130 2396 201501120300 1188
- 按銷(xiāo)售金額排序,取出前20%的時(shí)間段和相應(yīng)金額
select * from (select shop_id
, stat_date
, ordamt
, ntile
(5) over(partition by shop_id
order by ordamt
desc) sorted
from rt_data
where dt
= '2015-01-11' and shop_id
= 10026
) t
where sorted
= 1;10026 201501121100 26259.0 1
10026 201501121200 20127.0 1
10026 201501121300 18138.0 1
10026 201501121130 16516.0 1
10026 201501121400 15956.0 1
10026 201501121000 15679.0 1
10026 201501122200 14290.0 1
10026 201501121430 14203.0 1
10026 201501121330 13290.0 1
總結(jié)
以上是生活随笔為你收集整理的Hive窗口函数的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。