日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hive2

發布時間:2023/12/18 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hive2 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
4.hive優化 1)跑sql的時候會出現的參數:In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number> 如果大于<number>,就會多生成一個reduce<number> =1024 <1k 一個reduce1m 10個reduceset hive.exec.reducers.bytes.per.reducer=20000;select user_id,count(1) as order_cnt from orders group by user_id limit 10; --結果number of mappers: 1; number of reducers: 1009 In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>set hive.exec.reducers.max=10;-- number of mappers: 1; number of reducers: 10 In order to set a constant number of reducers:set mapreduce.job.reduces=<number>set mapreduce.job.reduces=5;--number of mappers: 1; number of reducers: 5set mapreduce.job.reduces=15;--number of mappers: 1; number of reducers: 15 對你當前窗口,或者執行任務(腳本)過程中生效2)where條件使得group by冗余 map 和 reduce執行過程是一個同步的過程同步:打電話異步:發短信 1:map執行完 reduce在執行 1+2=3:reduce2:map reducemap 60% reduce=3%3)只有一個reducea.沒有group byset mapreduce.job.reduces=5;select count(1) from orders where order_dow='0';--number of mappers: 1; number of reducers: 1b.order byset mapreduce.job.reduces=5;select user_id,order_dow from orders where order_dow='0'order by user_idlimit 10;-- number of mappers: 1; number of reducers: 1c.笛卡爾積 cross producttmp_d 1 2 3 4 5 select * from tmp_d join (select * from tmp_d)t where tmp_d.user_id=t.user_id; --相當于on join沒有on的字段關聯 1 1 2 1 3 1 1 2 2 2 3 2 1 3 2 3 3 3 user product(庫中所有商品中調小部分覺得這個用戶喜歡 召回(match) 候選集1000) top10 users 母嬰類 products 要同時考慮users和products信息來給它們做一個篩選(粗粒度)5)map join select /*+ MAPJOIN(aisles) */ a.aisle as aisle,p.product_id as product_id from aisles a join product p on a.aisle_id=p.aisle_id limit 10;dict hashMap {aisle_id : aisle} for line in products:ss = line.split('\t')aisle_id = ss[0]product_id = ss[1]aisle = dict[aisle_id]print '%s\t%s'%(aisle,product_id)6)union all + distinct == union --運行時間:74.712 seconds 2job select count( *) c from ( select order_id,user_id,order_dow from orders where order_dow='0' union all select order_id,user_id,order_dow from orders where order_dow='0' union all select order_id,user_id,order_dow from orders where order_dow='1' )t;--運行時間122.996 seconds 3 job select * from( select order_id,user_id,order_dow from orders where order_dow='0' union select order_id,user_id,order_dow from orders where order_dow='0' union select order_id,user_id,order_dow from orders where order_dow='1')t;7) set hive.groupby.skewindata=true; 將一個map reduce拆分成兩個map reduce ‘-’(‘’,-1,0,null)1億條 到一個reduce上面,1個reduce處理6000w ‘-1% 200w求和 =》1條 29 reduce處理剩余的4000w 99%1.隨機分發到不同的reduce節點,進行聚合(count2. 最終的一個reduce做最終結果的聚合(200w求和 =》1條)select add_to_cart_order,count(1) as cnt from order_products_prior group by add_to_cart_order limit 10; select user_id,count(1) as cnt from order_products_prior group by user_id limit 10; -- 沒指定set hive.groupby.skewindata=true; --Launching Job 1 out of 1 -- 1m 41s--指定了set hive.groupby.skewindata=true; --Launching Job 1 out of 2 -- 2m 50s 如果在不導致reduce一直失敗起不來的時候,就不用這個變量 如果確實出現了其中一個reduce的處理數據量太多,導致任務一直出問題,運行時間長。這種情況需要設置這個變量。凌晨定時任務,近一周報表,跑了3個小時。 洗出來的基礎表,3點出來,7點出來,后面接了70任務 8)MR的數量--Launching Job 1 out of 1select ord.order_id order_id,tra.product_id product_id,pri.reordered reordered from orders ord join train tra on ord.order_id=tra.order_id join order_products_prior pri on ord.order_id=pri.order_id limit 10;--兩個MR任務select ord.order_id,tra.product_id,pro.aisle_id from orders ord join trains tra on ord.order_id=tra.order_id join products pro on tra.product_id=pro.product_id limit 10;9/*+ STREAMTABLE(a) */ a是大表 類似map join 放到select中的,區別:它是指定大表 select /*+STREAMTABLE(pr)*/ ord.order_id,pr.product_id,pro.aisle_id from orders ord join order_products_prior pr on ord.order_id=pr.order_id join products pro on pr.product_id=pro.product_id limit 10;10)LEFT OUTER JOIN select od.user_id, od.order_id, tr.product_id from (select user_id,order_id,order_dow from orders limit 100)od left outer join (select order_id,product_id,reordered from train)tr on (od.order_id=tr.order_id and od.order_dow='0' and tr.reordered=1) limit 30;--join默認是inner11)set hive.exec.parallel=true 1:map執行完 reduce在執行 1+2=3:reduce 2:map reduce12) 1. '-' ,where age<>'-' 直接丟掉這個數據 select age,count(1) group by age where age<>'-'1_- 2_- 3_-怎么定位具體哪幾個key發生傾斜? sample SELECT COUNT(1) FROM (SELECT * FROM lxw1 TABLESAMPLE (200 ROWS)) x; SELECT * FROM udata TABLESAMPLE (50 PERCENT); select * from table_name where col=xxx order by rand() limit num; SELECT * FROM lxw1 TABLESAMPLE (30M);長尾數據

?

轉載于:https://www.cnblogs.com/hejunhong/p/11186393.html

總結

以上是生活随笔為你收集整理的hive2的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。