经典Hive-SQL面试题及答案
生活随笔
收集整理的這篇文章主要介紹了
经典Hive-SQL面试题及答案
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
目錄
?
第一題 求分區(qū)累加值
第二題? UV和每個(gè)店鋪訪問量top3信息
?
Hive sql解答
第一題 求分區(qū)累加值
我們有如下的用戶訪問數(shù)據(jù)
userId visitDate visitCount u01 2017/1/21 5 u02 2017/1/23 6 u03 2017/1/22 8 u04 2017/1/20 3 u01 2017/1/23 6 u01 2017/2/21 8 U02 2017/1/23 6 U01 2017/2/22 4要求使用SQL統(tǒng)計(jì)出每個(gè)用戶的累積訪問次數(shù),如下表所示:
用戶id 月份 小計(jì) 累積 u01 2017-01 11 11 u01 2017-02 12 23 u02 2017-01 12 12 u03 2017-01 8 8 u04 2017-01 3 3創(chuàng)建表,準(zhǔn)備數(shù)據(jù),使用mysql8.0
CREATE TABLE `user_visit` (`userId` varchar(255) NOT NULL,`visitDate` varchar(255) NOT NULL,`visitCount` tinyint DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("u01","2017/1/21",5); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("u02","2017/1/23",6); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("u03","2017/1/22",8); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("u04","2017/1/20",3); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("u01","2017/1/23",6); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("u01","2017/2/21",8); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("U02","2017/1/23",6); INSERT INTO user_visit (userId,visitDate,visitCount)VALUES("U01","2017/2/22",4);解法:
SELECT t.userId as 用戶id,t.month as 月份,t.subtotal as 小計(jì), sum(subtotal) over (PARTITION BY t.userId ORDER BY userId,month) as 累積 FROM ( select userId,DATE_FORMAT(visitDate,'%Y-%m') as month,sum(visitCount) as subtotal FROM user_visit GROUP BY userId,month ) t;第二題? UV和每個(gè)店鋪訪問量top3信息
有50W個(gè)京東店鋪,每個(gè)顧客訪客訪問任何一個(gè)店鋪的任何一個(gè)商品時(shí)都會(huì)產(chǎn)生一條訪問日志, 訪問日志存儲(chǔ)的表名為Visit,訪客的用戶id為user_id,被訪問的店鋪名稱為shop,數(shù)據(jù)如下: u1 au2 bu1 bu1 au3 cu4 bu1 au2 cu5 bu4 bu6 cu2 cu1 bu2 au2 au3 au5 au5 au5 a 請(qǐng)統(tǒng)計(jì): (1)每個(gè)店鋪的UV(訪客數(shù))(2)每個(gè)店鋪訪問次數(shù)top3的訪客信息。輸出店鋪名稱、訪客id、訪問次數(shù)創(chuàng)建表,準(zhǔn)備數(shù)據(jù),使用mysql8.0
CREATE TABLE `Visit` (`user_id` varchar(255) NOT NULL,`shop` varchar(255) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;INSERT INTO Visit (user_id,shop)VALUES ("u1","a"),("u2","b"),("u1","b"),("u1","a"),("u3","c"), ("u4","b"),("u1","a"),("u2","c"),("u5","b"),("u4","b"), ("u6","c"),("u2","c"),("u1","b"),("u2","a"),("u2","a"), ("u3","a"),("u5","a"),("u5","a"),("u5","a");(1)
-- DISTINCT去重 SELECT shop,count(DISTINCT(user_id)) UV FROM Visit GROUP BY shop; -- GROUP BY去重 SELECT t.shop,count(t.user_id) UV FROM ( SELECT shop,user_id FROM Visit GROUP BY shop,user_id ) t GROUP BY t.shop;結(jié)果:
(2)
SELECT t1.shop,t1.user_id,t1.user_shop_count FROM ( SELECT t.*,row_number()over(PARTITION BY t.shop ORDER BY t.user_shop_count DESC) shop_top FROM ( SELECT shop,user_id,COUNT(*) user_shop_count FROM Visit GROUP BY shop,user_id ) t )t1 WHERE t1.shop_top <=3;結(jié)果:
?
?
?
?
Hive sql解答
-- 第1題 CREATE TABLE user_visit ( userId string, visitDate string , visitCount INT ) ROW format delimited FIELDS TERMINATED BY "\t";INSERT INTO TABLE user_visit VALUES ( 'u01', '2017/1/21', 5 ),( 'u02', '2017/1/23', 6 ), ( 'u03', '2017/1/22', 8 ),( 'u04', '2017/1/20', 3 ), ( 'u01', '2017/1/23', 6 ),( 'u01', '2017/2/21', 8 ), ( 'u02', '2017/1/23', 6 ),( 'u01', '2017/2/22', 4 );select DATE_FORMAT(regexp_replace(visitDate,'/','-'),'YYYY-MM') from user_visit;select userId, DATE_FORMAT(regexp_replace(visitDate,'/','-'),'YYYY-MM') as visitMonth, visitCount FROM user_visit;select userId,visitMonth,sum(visitCount) as subtotal FROM ( select userId, DATE_FORMAT(regexp_replace(visitDate,'/','-'),'YYYY-MM') as visitMonth, visitCount FROM user_visit )t1 GROUP BY userId,visitMonth; -- 最終答案 SELECT t.userId as userid,t.visitMonth,t.subtotal,sum(t.subtotal) over (PARTITION BY t.userId ORDER BY t.userId,t.visitMonth) as totals FROM ( select userId,visitMonth,sum(visitCount) as subtotal FROM ( select userId, DATE_FORMAT(regexp_replace(visitDate,'/','-'),'YYYY-MM') as visitMonth, visitCount FROM user_visit )t1 GROUP BY userId,visitMonth ) t;-- 第2題 CREATE TABLE Visit ( user_id string, shop string ) ROW format delimited FIELDS TERMINATED BY '\t'; INSERT INTO TABLE Visit VALUES ( 'u1', 'a' ),( 'u2', 'b' ),( 'u1', 'b' ),( 'u1', 'a' ),( 'u3', 'c' ), ( 'u4', 'b' ),( 'u1', 'a' ),( 'u2', 'c' ),( 'u5', 'b' ),( 'u4', 'b' ), ( 'u6', 'c' ),( 'u2', 'c' ),( 'u1', 'b' ),( 'u2', 'a' ),( 'u2', 'a' ), ( 'u3', 'a' ),( 'u5', 'a' ),( 'u5', 'a' ),( 'u5', 'a' );(1) -- DISTINCT去重 SELECT shop,count(DISTINCT(user_id)) UV FROM Visit GROUP BY shop; -- GROUP BY去重 SELECT t.shop,count(t.user_id) UV FROM ( SELECT shop,user_id FROM Visit GROUP BY shop,user_id ) t GROUP BY t.shop;(2) SELECT t1.shop,t1.user_id,t1.user_shop_count FROM ( SELECT t.*,row_number()over(PARTITION BY t.shop ORDER BY t.user_shop_count DESC) shop_top FROM ( SELECT shop,user_id,COUNT(*) user_shop_count FROM Visit GROUP BY shop,user_id ) t )t1 WHERE t1.shop_top <=3;?
?
?
參考博客:經(jīng)典Hive-SQL面試題
參考博客:[hive] 經(jīng)典sql題及答案(一)
總結(jié)
以上是生活随笔為你收集整理的经典Hive-SQL面试题及答案的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: sql面试题及答案总结
- 下一篇: 经典的SQL面试题及答案