當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

Hive SQL基础

發布時間：2025/4/5 数据库 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hive SQL基础小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Hive執行順序：

**FROM-->WHERE-->GROUP BY-->HAVING-->SELECT-->ORDER BY**

書寫順序：

**SELECT DISTINCT FROM JOIN ON WHERE GROUP BY WITH HAVING ORDER BY LIMIT**

1.HIVE簡介

1）HIVE是基于Hadoop的數據倉庫

Hive SQL與傳統SQL對比：

PS:塊設備是i/o設備中的一類，是將信息存儲在固定大小的塊中，每個塊都有自己的地址，還可以在設備的任意位置讀取一定長度的數據，例如硬盤，U盤，SD卡等。

2）MapReduce簡介

2.基礎語法

1）SELECT…A…FROM…B…WHERE…C…

A:列名
B:表名
C:篩選條件

user_info列名舉例

user_id	10001,10002（唯一的）
user_name	Amy，Dennis（唯一的）
sex	[male,female]
age	[13,70]
city	beijing,shanghai
firstactivetime	2019-04-19 15:40:00
level	[1,10]
extra1	string類型:{“systemtype”:“ios”,“education”:“master”,“marriage_status”:“1”,“phonebrand”:“iphoneX”}
extra2	map<string.string>類型: {“systemtype”:“ios”,“education”:“master”,“marriage_status”:“1”,“phonebrand”:“iphoneX”}

--選出城市在北京，性別為女的10個用戶名 select user_name from user_info where city='beijing' and sex='female' limit 10; user_trade列名舉例

user_name	Amy,Dennis
piece	購買數量
price	價格
pay_amount	支付金額
goods_category	food,clothes,book,computer,electronics,shoes
pay_time	2412521561,時間戳
dt	partition,‘yyyy-mm-dd’

注意：如果該表是一個分區表，則where條件中必須對分區字段進行限制。

--選出在2019年4月9日，購買的商品品類是food的用戶名、購買數量、支付金額 select user_name,piece,pay_amount from user_trade where dt="2019-04-09" and goods_category='food';

未對分區進行限制的報錯：

select user_name,piece,pay_amount from user_trade where goods_category='food';

注：分區表必須限制分區字段

2）GROUP BY 的作用：分類匯總

--2019年一月到四月，每個品類有多少人購買，累計金額是多少 select goods_category,count(distinct user_name) as user_num,sum(pay_amount) as total_amount from user_trade where dt between '2019-01-01' and '2019-04-03' group by goods_category;

常用的聚合函數：
1.count():計數count(distinct…)去重計數；
2.sum():求和；
3.avg():平均值；
4.max():最大值；
5.min():最小值
GROUP BY …HAVING

--2019年4月，支付金額超過5萬元的用戶 select user_name,sum(pay_amount) as total_amount from user_trade where dt between '2019-04-01' and '2019-04-30' group by user_name having sum(pay_amount)>50000;

HAVING:對GROUP BY 的對象進行篩選，僅返回符合HAVING條件的結果

3）ORDER BY

--2019年4月，支付金額最多的top5用戶 select user_name,sum(pay_amount) as total_amount from user_trade where dt between '2019-04-01' and '2019-04-30' group by user_name order by total_amount desc limin 5;

ASC:升序（默認）
DESC:降序
對多個字段進行排序：ORDER BY A ASC,B DESC
ORDER BY A DESC,B DESC
為什么order by 后面不直接寫sum(pay_amount)而是用total_amount？
答：執行順序，order by 的執行順序在select之后，所以需要使用重新定義的列名進行排序。

4）執行順序

FROM–>WHERE–>GROUP BY -->HAVING–>SELECT–>ORDER BY

3.常用函數

1）如何把時間戳轉化為日期

select pay_time,from_unixtime(pay_time, 'yyyy-MM-dd hh:mm:ss') from user_trade where dt='2019-04-09';

from_unixtime(bigint unitime,string format)
format:
1.yyyy-MM-dd hh:mm:ss
2.yyyy-MM-dd hh
3.yyyy-MM-dd hh：mm
4.yyyyMMdd
把日期轉化為時間戳–unix_timestamp
unix_timestamp(string date)

2）如何計算日期間隔

--用戶的首次激活時間，與2019年5月1日的日期間隔 select user_name,datediff('2019-05-01',to_date(firstactivetime)) from user_info limit 10;

datediff(string enddate,string startdate):結束如期減去開始日期的天數
拓展：日期增加函數、減少函數----date_add、date_sub
date_add(string startdate，int days)
date_sub(string startdate，int days)

3）條件函數

case when

--統計一下四個年齡段20歲一下、20-30歲、30-40歲、40歲以上的用戶數 select case when age<20 then '20歲以下'when age>=20 and age <30 then '20-30歲'when age>=30 and age <40 then '30-40歲'else '40歲以上' end as age_type,count(distinct user_id) user_num from user_info group by case when age<20 then '20歲一下'when age>=20 and age <30 then '20-30歲'when age>=30 and age <40 then '30-40歲'else '40歲以上' end ;

if

--統計每個性別用戶等級高低的分布情況（level 大于5為高級） select sex,if(level>5,'高','低') as level_type,count(distinct user_id) user_num from user_info group by sex,if(level>5,'高','低');

4）字符串函數

--每個月激活的用戶數 select substr(firstactivetime,1,7) as month,count(distinct user_id) user_num from user_info group by substr(firstactivetime,1,7);

substr(string A,int start,int len)
備注：如果不指定截取長度，則從起始位一直截取到最后
不同手機品牌的用戶數

extra類型

extra1	string類型:{“systemtype”:“ios”,“education”:“master”,“marriage_status”:“1”,“phonebrand”:“iphoneX”}
extra2	map<string.string>類型: {“systemtype”:“ios”,“education”:“master”,“marriage_status”:“1”,“phonebrand”:“iphoneX”}

-- 不同手機品牌的用戶數 ## 第一種情況 select get_json_object(extral,'$.phonebrand') as phone_brand,count(distinct user_id) user_num from user_info group by get_json_object(extral,'$.phonebrand'); ## 第二種情況 select extral2['phonebrand'] as phone_brand,count(distinct user_id) user_num from user_info group by extra2['phonebrand'];

param1：需要解析的json字段；
param2：用key去除想要獲取的value

5）聚合統計函數

– 如何取出在user_list_1表但不在user_list_2的用戶？

select a.user_id,a.user_name from user_list_1 a left join user_list_2 b on a.user_id=b.user_id where b.user_id is null; --注：MySQL中的寫法(子查詢) select user_id,user_name from user_list_1 where user_id not in(select user_id from user_list_2) --在2019年購買但是沒有退款的用戶 select a.user_name from (select distinct user_name from user_trade where year(dt)=2019)a left join (select distinct user_name from user_refund where year(dt)=2019)b on a.user_name=b.user_name where b.user_name is null; -- 在2017年、2018年、2019年都有交易的用戶 -- 第一種寫法 select distinct a.user_name from trade_2017 a join trade_2018 b on a.user_name=b.user_name join trade_2019 c on b.user_name=c.user_name;-- 第二種寫法(在表的數據量很大時，推薦這種寫法，hive中建議這種寫法) select a.user_name from(select distinct user_name from trade_2017)a join(select distinct user_name from trade_2018)b on a.user_name=b.user_name join(select distinct user_name from trade_2019)c on b.user_name=c.user_name;

總結

以上是生活随笔為你收集整理的Hive SQL基础的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。