日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 运维知识 > windows >内容正文

windows

数据仓库之电商数仓-- 3.4、电商数据仓库系统(ADS层)

發(fā)布時間:2025/3/17 windows 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 数据仓库之电商数仓-- 3.4、电商数据仓库系统(ADS层) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

目錄

  • 九、數(shù)倉搭建-ADS層
    • 9.1 建表說明
    • 9.2 訪客主題
      • 9.2.1 訪客統(tǒng)計(jì)
      • 9.2.2 路徑分析
    • 9.3 用戶主題
      • 9.3.1 用戶統(tǒng)計(jì)
      • 9.3.2 用戶變動統(tǒng)計(jì)
      • 9.3.3 用戶行為漏斗分析
      • 9.3.4 用戶留存率
    • 9.4 商品主題
      • 9.4.1 商品統(tǒng)計(jì)
      • 9.4.2 品牌復(fù)購率
    • 9.5 訂單主題
      • 9.5.1 訂單統(tǒng)計(jì)
      • 9.5.2 各地區(qū)訂單統(tǒng)計(jì)
    • 9.6 優(yōu)惠券主題
      • 9.6.1 優(yōu)惠券統(tǒng)計(jì)
    • 9.7 活動主題
      • 9.7.1 活動統(tǒng)計(jì)
    • 9.8 ADS層業(yè)務(wù)數(shù)據(jù)導(dǎo)入腳本
  • 第10章 全流程調(diào)度
    • 10.1 Azkaban部署
    • 10.2 創(chuàng)建MySQL數(shù)據(jù)庫和表
    • 10.3 Sqoop導(dǎo)出腳本
    • 10.4 全調(diào)度流程
      • 10.4.1 數(shù)據(jù)準(zhǔn)備
      • 10.4.2 編寫Azkaban工作流程配置文件

-----------------------------------------------------分隔符-----------------------------------------------------
數(shù)據(jù)倉庫之電商數(shù)倉-- 1、用戶行為數(shù)據(jù)采集==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 2、業(yè)務(wù)數(shù)據(jù)采集平臺==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 3.1、電商數(shù)據(jù)倉庫系統(tǒng)(DIM層、ODS層、DWD層)==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 3.2、電商數(shù)據(jù)倉庫系統(tǒng)(DWS層)==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 3.3、電商數(shù)據(jù)倉庫系統(tǒng)(DWT層)==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 3.4、電商數(shù)據(jù)倉庫系統(tǒng)(ADS層)==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 4、可視化報表Superset==>
數(shù)據(jù)倉庫之電商數(shù)倉-- 5、即席查詢Kylin==>

九、數(shù)倉搭建-ADS層

9.1 建表說明

ADS層不涉及建模,建表根據(jù)具體需求而定。

9.2 訪客主題

9.2.1 訪客統(tǒng)計(jì)

  • 建表語句
  • DROP TABLE IF EXISTS ads_visit_stats; CREATE EXTERNAL TABLE ads_visit_stats (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`is_new` STRING COMMENT '新老標(biāo)識,1:新,0:老',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`channel` STRING COMMENT '渠道',`uv_count` BIGINT COMMENT '日活(訪問人數(shù))',`duration_sec` BIGINT COMMENT '頁面停留總時長',`avg_duration_sec` BIGINT COMMENT '一次會話,頁面停留平均時長,單位為描述',`page_count` BIGINT COMMENT '頁面總瀏覽數(shù)',`avg_page_count` BIGINT COMMENT '一次會話,頁面平均瀏覽數(shù)',`sv_count` BIGINT COMMENT '會話次數(shù)',`bounce_count` BIGINT COMMENT '跳出數(shù)',`bounce_rate` DECIMAL(16,2) COMMENT '跳出率' ) COMMENT '訪客統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_visit_stats/';
  • 數(shù)據(jù)裝載
    思路分析:該需求的關(guān)鍵點(diǎn)為會話的劃分,總體實(shí)現(xiàn)思路可分為以下幾步:
    第一步:對所有頁面訪問記錄進(jìn)行會話的劃分。
    第二步:統(tǒng)計(jì)每個會話的瀏覽時長和瀏覽頁面數(shù)。
    第三步:統(tǒng)計(jì)上述各指標(biāo)。
  • insert overwrite table ads_visit_stats select * from ads_visit_stats union select'2020-06-14' dt,is_new,recent_days,channel,count(distinct(mid_id)) uv_count,cast(sum(duration)/1000 as bigint) duration_sec,cast(avg(duration)/1000 as bigint) avg_duration_sec,sum(page_count) page_count,cast(avg(page_count) as bigint) avg_page_count,count(*) sv_count,sum(if(page_count=1,1,0)) bounce_count,cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate from (selectsession_id,mid_id,is_new,recent_days,channel,count(*) page_count,sum(during_time) durationfrom(selectmid_id,channel,recent_days,is_new,last_page_id,page_id,during_time,concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_idfrom(selectmid_id,channel,last_page_id,page_id,during_time,ts,recent_days,if(visit_date_first>=date_add('2020-06-14',-recent_days+1),'1','0') is_newfrom(selectt1.mid_id,t1.channel,t1.last_page_id,t1.page_id,t1.during_time,t1.dt,t1.ts,t2.visit_date_firstfrom(selectmid_id,channel,last_page_id,page_id,during_time,dt,tsfrom dwd_page_logwhere dt>=date_add('2020-06-14',-30))t1left join(selectmid_id,visit_date_firstfrom dwt_visitor_topicwhere dt='2020-06-14')t2on t1.mid_id=t2.mid_id)t3 lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('2020-06-14',-recent_days+1))t4)t5group by session_id,mid_id,is_new,recent_days,channel )t6 group by is_new,recent_days,channel;

    9.2.2 路徑分析

    用戶路徑分析,顧名思義,就是指用戶在APP或網(wǎng)站中的訪問路徑。為了衡量網(wǎng)站優(yōu)化的效果或營銷推廣的效果,以及了解用戶行為偏好,時常要對訪問路徑進(jìn)行分析。
    用戶訪問路徑的可視化通常使用?;鶊D。如下圖所示,該圖可真實(shí)還原用戶的訪問路徑,包括頁面跳轉(zhuǎn)和頁面訪問次序。
    ?;鶊D需要我們提供每種頁面跳轉(zhuǎn)的次數(shù),每個跳轉(zhuǎn)由source/target表示,source指跳轉(zhuǎn)起始頁面,target表示跳轉(zhuǎn)終到頁面。

  • 建表語句
  • DROP TABLE IF EXISTS ads_page_path; CREATE EXTERNAL TABLE ads_page_path (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`source` STRING COMMENT '跳轉(zhuǎn)起始頁面ID',`target` STRING COMMENT '跳轉(zhuǎn)終到頁面ID',`path_count` BIGINT COMMENT '跳轉(zhuǎn)次數(shù)' ) COMMENT '頁面瀏覽路徑' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_page_path/';
  • 數(shù)據(jù)裝載
  • ??:
    思路分析:該需求要統(tǒng)計(jì)的就是每種跳轉(zhuǎn)的次數(shù),故理論上對source/target進(jìn)行分組count()即可。統(tǒng)計(jì)時需注意以下兩點(diǎn):
    1). ?;鶊D的source不允許為空,但target可為空。
    2). ?;鶊D所展示的流程不允許存在環(huán)。

    insert overwrite table ads_page_path select * from ads_page_path union select'2020-06-14',recent_days,source,target,count(*) from (selectrecent_days,concat('step-',step,':',source) source,concat('step-',step+1,':',target) targetfrom(selectrecent_days,page_id source,lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,row_number() over (partition by recent_days,session_id order by ts) stepfrom(selectrecent_days,last_page_id,page_id,ts,concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_idfrom dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('2020-06-14',-30)and dt>=date_add('2020-06-14',-recent_days+1))t2)t3 )t4 group by recent_days,source,target;

    9.3 用戶主題

    9.3.1 用戶統(tǒng)計(jì)

    該需求為用戶綜合統(tǒng)計(jì),其中包含若干指標(biāo),以下為對每個指標(biāo)的解釋說明。

  • 建表語句
  • DROP TABLE IF EXISTS ads_user_total; CREATE EXTERNAL TABLE `ads_user_total` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),0:累積值,1:最近1天,7:最近7天,30:最近30天',`new_user_count` BIGINT COMMENT '新注冊用戶數(shù)',`new_order_user_count` BIGINT COMMENT '新增下單用戶數(shù)',`order_final_amount` DECIMAL(16,2) COMMENT '下單總金額',`order_user_count` BIGINT COMMENT '下單用戶數(shù)',`no_order_user_count` BIGINT COMMENT '未下單用戶數(shù)(具體指活躍用戶中未下單用戶)' ) COMMENT '用戶統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_total/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_user_total select * from ads_user_total union select'2020-06-14',recent_days,sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,sum(order_final_amount) order_final_amount,sum(if(order_final_amount>0,1,0)) order_user_count,sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count from (selectrecent_days,user_id,login_date_first,login_date_last,order_date_first,case when recent_days=0 then order_final_amountwhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_final_amount,if(recent_days=0,'1970-01-01',date_add('2020-06-14',-recent_days+1)) recent_days_agofrom dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_dayswhere dt='2020-06-14' )t1 group by recent_days;

    9.3.2 用戶變動統(tǒng)計(jì)

    該需求包括兩個指標(biāo),分別為流失用戶數(shù)和回流用戶數(shù),以下為對兩個指標(biāo)的解釋說明。

  • 建表語句
  • DROP TABLE IF EXISTS ads_user_change; CREATE EXTERNAL TABLE `ads_user_change` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`user_churn_count` BIGINT COMMENT '流失用戶數(shù)',`user_back_count` BIGINT COMMENT '回流用戶數(shù)' ) COMMENT '用戶變動統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_change/';
  • 數(shù)據(jù)裝載
  • 思路分析:
    流失用戶:末次活躍時間為7日前的用戶即為流失用戶。
    回流用戶:末次活躍時間為今日,上次活躍時間在8日前的用戶即為回流用戶。

    insert overwrite table ads_user_change select * from ads_user_change union selectchurn.dt,user_churn_count,user_back_count from (select'2020-06-14' dt,count(*) user_churn_countfrom dwt_user_topicwhere dt='2020-06-14'and login_date_last=date_add('2020-06-14',-7) )churn join (select'2020-06-14' dt,count(*) user_back_countfrom(selectuser_id,login_date_lastfrom dwt_user_topicwhere dt='2020-06-14'and login_date_last='2020-06-14')t1join(selectuser_id,login_date_last login_date_previousfrom dwt_user_topicwhere dt=date_add('2020-06-14',-1))t2on t1.user_id=t2.user_idwhere datediff(login_date_last,login_date_previous)>=8 )back on churn.dt=back.dt;

    9.3.3 用戶行為漏斗分析

    漏斗分析是一個數(shù)據(jù)分析模型,它能夠科學(xué)反映一個業(yè)務(wù)過程從起點(diǎn)到終點(diǎn)各階段用戶轉(zhuǎn)化情況。由于其能將各階段環(huán)節(jié)都展示出來,故哪個階段存在問題,就能一目了然。

    該需求要求統(tǒng)計(jì)一個完整的購物流程各個階段的人數(shù)。

  • 建表語句
  • DROP TABLE IF EXISTS ads_user_action; CREATE EXTERNAL TABLE `ads_user_action` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`home_count` BIGINT COMMENT '瀏覽首頁人數(shù)',`good_detail_count` BIGINT COMMENT '瀏覽商品詳情頁人數(shù)',`cart_count` BIGINT COMMENT '加入購物車人數(shù)',`order_count` BIGINT COMMENT '下單人數(shù)',`payment_count` BIGINT COMMENT '支付人數(shù)' ) COMMENT '漏斗分析' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_action/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_user_action select * from ads_user_action union select'2020-06-14',cop.recent_days,home_count,good_detail_count,cart_count,order_count,payment_count from (selectrecent_days,sum(if(array_contains(pages,'home'),1,0)) home_count,sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_countfrom(selectrecent_days,mid_id,collect_set(page_id) pagesfrom dwd_page_log lateral view explode(array(1,7,30)) tmp as recent_dayswhere dt>=date_add('2020-06-14',-recent_days+1)and page_id in ('home','good_detail')group by recent_days,mid_id)t1group by recent_days )page join (selectrecent_days,sum(if(cart_count>0,1,0)) cart_count,sum(if(order_count>0,1,0)) order_count,sum(if(payment_count>0,1,0)) payment_countfrom(selectrecent_days,casewhen recent_days=1 then cart_last_1d_countwhen recent_days=7 then cart_last_7d_countwhen recent_days=30 then cart_last_30d_countend cart_count,casewhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,casewhen recent_days=1 then payment_last_1d_countwhen recent_days=7 then payment_last_7d_countwhen recent_days=30 then payment_last_30d_countend payment_countfrom dwt_user_topic lateral view explode(array(1,7,30)) tmp as recent_dayswhere dt='2020-06-14')t1group by recent_days )cop on page.recent_days=cop.recent_days

    9.3.4 用戶留存率

    留存分析一般包含新增留存和活躍留存分析。
    新增留存分析是分析某天的新增用戶中,有多少人有后續(xù)的活躍行為;
    活躍留存分析是分析某天的活躍用戶中,有多少人有后續(xù)的活躍行為。
    留存分析是衡量產(chǎn)品對用戶價值高低的重要指標(biāo)。
    此處要求統(tǒng)計(jì)新增留存率,新增留存率具體是指留存用戶數(shù)與新增用戶數(shù)的比值,例如2020-06-14新增100個用戶,1日之后(2020-06-15)這100人中有80個人活躍了,那2020-06-14的1日留存數(shù)則為80,2020-06-14的1日留存率則為80%。

    要求統(tǒng)計(jì)每天的1至7日留存率,如下圖所示。

  • 建表語句
  • DROP TABLE IF EXISTS ads_user_retention; CREATE EXTERNAL TABLE ads_user_retention (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`create_date` STRING COMMENT '用戶新增日期',`retention_day` BIGINT COMMENT '截至當(dāng)前日期留存天數(shù)',`retention_count` BIGINT COMMENT '留存用戶數(shù)量',`new_user_count` BIGINT COMMENT '新增用戶數(shù)量',`retention_rate` DECIMAL(16,2) COMMENT '留存率' ) COMMENT '用戶留存率' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_retention/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_user_retention select * from ads_user_retention union select'2020-06-14',login_date_first create_date,datediff('2020-06-14',login_date_first) retention_day,sum(if(login_date_last='2020-06-14',1,0)) retention_count,count(*) new_user_count,cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate from dwt_user_topic where dt='2020-06-14' and login_date_first>=date_add('2020-06-14',-7) and login_date_first<'2020-06-14' group by login_date_first;

    9.4 商品主題

    9.4.1 商品統(tǒng)計(jì)

    該指標(biāo)為商品綜合統(tǒng)計(jì),包含每個spu被下單總次數(shù)和被下單總金額。

  • 建表語句
  • DROP TABLE IF EXISTS ads_order_spu_stats; CREATE EXTERNAL TABLE `ads_order_spu_stats` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`spu_id` STRING COMMENT '商品ID',`spu_name` STRING COMMENT '商品名稱',`tm_id` STRING COMMENT '品牌ID',`tm_name` STRING COMMENT '品牌名稱',`category3_id` STRING COMMENT '三級品類ID',`category3_name` STRING COMMENT '三級品類名稱',`category2_id` STRING COMMENT '二級品類ID',`category2_name` STRING COMMENT '二級品類名稱',`category1_id` STRING COMMENT '一級品類ID',`category1_name` STRING COMMENT '一級品類名稱',`order_count` BIGINT COMMENT '訂單數(shù)',`order_amount` DECIMAL(16,2) COMMENT '訂單金額' ) COMMENT '商品銷售統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_spu_stats/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_order_spu_stats select * from ads_order_spu_stats union select'2020-06-14' dt,recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name,sum(order_count),sum(order_amount) from (selectrecent_days,sku_id,casewhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,casewhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_amountfrom dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='2020-06-14' )t1 left join (selectid,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_namefrom dim_sku_infowhere dt='2020-06-14' )t2 on t1.sku_id=t2.id group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;

    9.4.2 品牌復(fù)購率

    品牌復(fù)購率是指一段時間內(nèi)重復(fù)購買某品牌的人數(shù)與購買過該品牌的人數(shù)的比值。重復(fù)購買即購買次數(shù)大于等于2,購買過即購買次數(shù)大于1。
    此處要求統(tǒng)計(jì)最近1,7,30天的各品牌復(fù)購率。

  • 建表語句
  • DROP TABLE IF EXISTS ads_repeat_purchase; CREATE EXTERNAL TABLE `ads_repeat_purchase` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`tm_id` STRING COMMENT '品牌ID',`tm_name` STRING COMMENT '品牌名稱',`order_repeat_rate` DECIMAL(16,2) COMMENT '復(fù)購率' ) COMMENT '品牌復(fù)購率' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_repeat_purchase/';
  • 數(shù)據(jù)裝載
    **思路分析:**該需求可分兩步實(shí)現(xiàn):
    1). 統(tǒng)計(jì)每個用戶購買每個品牌的次數(shù);
    2). 分別統(tǒng)計(jì)購買次數(shù)大于1的人數(shù)和大于2的人數(shù)。
  • insert overwrite table ads_repeat_purchase select * from ads_repeat_purchase union select'2020-06-14' dt,recent_days,tm_id,tm_name,cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2)) from (selectrecent_days,user_id,tm_id,tm_name,sum(order_count) order_countfrom(selectrecent_days,user_id,sku_id,count(*) order_countfrom dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('2020-06-14',-29)and dt>=date_add('2020-06-14',-recent_days+1)group by recent_days, user_id,sku_id)t1left join(selectid,tm_id,tm_namefrom dim_sku_infowhere dt='2020-06-14')t2on t1.sku_id=t2.idgroup by recent_days,user_id,tm_id,tm_name )t3 group by recent_days,tm_id,tm_name;

    9.5 訂單主題

    9.5.1 訂單統(tǒng)計(jì)

    該需求包含訂單總數(shù),訂單總金額和下單總?cè)藬?shù)。

  • 建表語句
  • DROP TABLE IF EXISTS ads_order_total; CREATE EXTERNAL TABLE `ads_order_total` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`order_count` BIGINT COMMENT '訂單數(shù)',`order_amount` DECIMAL(16,2) COMMENT '訂單金額',`order_user_count` BIGINT COMMENT '下單人數(shù)' ) COMMENT '訂單統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_total/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_order_total select * from ads_order_total union select'2020-06-14',recent_days,sum(order_count),sum(order_final_amount) order_final_amount,sum(if(order_final_amount>0,1,0)) order_user_count from (selectrecent_days,user_id,case when recent_days=0 then order_countwhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,case when recent_days=0 then order_final_amountwhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_final_amountfrom dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='2020-06-14' )t1 group by recent_days;

    9.5.2 各地區(qū)訂單統(tǒng)計(jì)

    該需求包含各省份訂單總數(shù)和訂單總金額。

  • 建表語句
  • DROP TABLE IF EXISTS ads_order_by_province; CREATE EXTERNAL TABLE `ads_order_by_province` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`province_id` STRING COMMENT '省份ID',`province_name` STRING COMMENT '省份名稱',`area_code` STRING COMMENT '地區(qū)編碼',`iso_code` STRING COMMENT '國際標(biāo)準(zhǔn)地區(qū)編碼',`iso_code_3166_2` STRING COMMENT '國際標(biāo)準(zhǔn)地區(qū)編碼',`order_count` BIGINT COMMENT '訂單數(shù)',`order_amount` DECIMAL(16,2) COMMENT '訂單金額' ) COMMENT '各地區(qū)訂單統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_by_province/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_order_by_province select * from ads_order_by_province union selectdt,recent_days,province_id,province_name,area_code,iso_code,iso_3166_2,order_count,order_amount from (select'2020-06-14' dt,recent_days,province_id,sum(order_count) order_count,sum(order_amount) order_amountfrom(selectrecent_days,province_id,casewhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,casewhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_amountfrom dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='2020-06-14')t1group by recent_days,province_id )t2 join dim_base_province t3 on t2.province_id=t3.id;

    9.6 優(yōu)惠券主題

    9.6.1 優(yōu)惠券統(tǒng)計(jì)

    該需求要求統(tǒng)計(jì)最近30日發(fā)布的所有優(yōu)惠券的領(lǐng)用情況和補(bǔ)貼率,補(bǔ)貼率是指,優(yōu)惠金額與使用優(yōu)惠券的訂單的原價金額的比值。

  • 建表語句
  • DROP TABLE IF EXISTS ads_coupon_stats; CREATE EXTERNAL TABLE ads_coupon_stats (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`coupon_id` STRING COMMENT '優(yōu)惠券ID',`coupon_name` STRING COMMENT '優(yōu)惠券名稱',`start_date` STRING COMMENT '發(fā)布日期',`rule_name` STRING COMMENT '優(yōu)惠規(guī)則,例如滿100元減10元',`get_count` BIGINT COMMENT '領(lǐng)取次數(shù)',`order_count` BIGINT COMMENT '使用(下單)次數(shù)',`expire_count` BIGINT COMMENT '過期次數(shù)',`order_original_amount` DECIMAL(16,2) COMMENT '使用優(yōu)惠券訂單原始金額',`order_final_amount` DECIMAL(16,2) COMMENT '使用優(yōu)惠券訂單最終金額',`reduce_amount` DECIMAL(16,2) COMMENT '優(yōu)惠金額',`reduce_rate` DECIMAL(16,2) COMMENT '補(bǔ)貼率' ) COMMENT '商品銷售統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_coupon_stats/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_coupon_stats select * from ads_coupon_stats union select'2020-06-14' dt,t1.id,coupon_name,start_date,rule_name,get_count,order_count,expire_count,order_original_amount,order_final_amount,reduce_amount,reduce_rate from (selectid,coupon_name,date_format(start_time,'yyyy-MM-dd') start_date,casewhen coupon_type='3201' then concat('滿',condition_amount,'元減',benefit_amount,'元')when coupon_type='3202' then concat('滿',condition_num,'件打', (1-benefit_discount)*10,'折')when coupon_type='3203' then concat('減',benefit_amount,'元')end rule_namefrom dim_coupon_infowhere dt='2020-06-14'and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29) )t1 left join (selectcoupon_id,get_count,order_count,expire_count,order_original_amount,order_final_amount,order_reduce_amount reduce_amount,cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_ratefrom dwt_coupon_topicwhere dt='2020-06-14' )t2 on t1.id=t2.coupon_id;

    9.7 活動主題

    9.7.1 活動統(tǒng)計(jì)

    該需求要求統(tǒng)計(jì)最近30日發(fā)布的所有活動的參與情況和補(bǔ)貼率,補(bǔ)貼率是指,優(yōu)惠金額與參與活動的訂單原價金額的比值。

  • 建表語句
  • DROP TABLE IF EXISTS ads_activity_stats; CREATE EXTERNAL TABLE `ads_activity_stats` (`dt` STRING COMMENT '統(tǒng)計(jì)日期',`activity_id` STRING COMMENT '活動ID',`activity_name` STRING COMMENT '活動名稱',`start_date` STRING COMMENT '活動開始日期',`order_count` BIGINT COMMENT '參與活動訂單數(shù)',`order_original_amount` DECIMAL(16,2) COMMENT '參與活動訂單原始金額',`order_final_amount` DECIMAL(16,2) COMMENT '參與活動訂單最終金額',`reduce_amount` DECIMAL(16,2) COMMENT '優(yōu)惠金額',`reduce_rate` DECIMAL(16,2) COMMENT '補(bǔ)貼率' ) COMMENT '商品銷售統(tǒng)計(jì)' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_activity_stats/';
  • 數(shù)據(jù)裝載
  • insert overwrite table ads_activity_stats select * from ads_activity_stats union select'2020-06-14' dt,t4.activity_id,activity_name,start_date,order_count,order_original_amount,order_final_amount,reduce_amount,reduce_rate from (selectactivity_id,activity_name,date_format(start_time,'yyyy-MM-dd') start_datefrom dim_activity_rule_infowhere dt='2020-06-14'and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)group by activity_id,activity_name,start_time )t4 left join (selectactivity_id,sum(order_count) order_count,sum(order_original_amount) order_original_amount,sum(order_final_amount) order_final_amount,sum(order_reduce_amount) reduce_amount,cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_ratefrom dwt_activity_topicwhere dt='2020-06-14'group by activity_id )t5 on t4.activity_id=t5.activity_id;

    自此 所有表已裝載完畢!

    9.8 ADS層業(yè)務(wù)數(shù)據(jù)導(dǎo)入腳本

  • 在/home/xiaobai/bin目錄下創(chuàng)建腳本dwt_to_ads.sh:
  • [xiaobai@hadoop102 ~]$ vim dwt_to_ads.sh #!/bin/bashAPP=gmall# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時間的前一天 if [ -n "$2" ] ;thendo_date=$2 else do_date=`date -d "-1 day" +%F` fiads_activity_stats=" insert overwrite table ${APP}.ads_activity_stats select * from ${APP}.ads_activity_stats union select'$do_date' dt,t4.activity_id,activity_name,start_date,order_count,order_original_amount,order_final_amount,reduce_amount,reduce_rate from (selectactivity_id,activity_name,date_format(start_time,'yyyy-MM-dd') start_datefrom ${APP}.dim_activity_rule_infowhere dt='$do_date'and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)group by activity_id,activity_name,start_time )t4 left join (selectactivity_id,sum(order_count) order_count,sum(order_original_amount) order_original_amount,sum(order_final_amount) order_final_amount,sum(order_reduce_amount) reduce_amount,cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_ratefrom ${APP}.dwt_activity_topicwhere dt='$do_date'group by activity_id )t5 on t4.activity_id=t5.activity_id; " ads_coupon_stats=" insert overwrite table ${APP}.ads_coupon_stats select * from ${APP}.ads_coupon_stats union select'$do_date' dt,t1.id,coupon_name,start_date,rule_name,get_count,order_count,expire_count,order_original_amount,order_final_amount,reduce_amount,reduce_rate from (selectid,coupon_name,date_format(start_time,'yyyy-MM-dd') start_date,casewhen coupon_type='3201' then concat('滿',condition_amount,'元減',benefit_amount,'元')when coupon_type='3202' then concat('滿',condition_num,'件打', (1-benefit_discount)*10,'折')when coupon_type='3203' then concat('減',benefit_amount,'元')end rule_namefrom ${APP}.dim_coupon_infowhere dt='$do_date'and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29) )t1 left join (selectcoupon_id,get_count,order_count,expire_count,order_original_amount,order_final_amount,order_reduce_amount reduce_amount,cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_ratefrom ${APP}.dwt_coupon_topicwhere dt='$do_date' )t2 on t1.id=t2.coupon_id; "ads_order_by_province=" insert overwrite table ${APP}.ads_order_by_province select * from ${APP}.ads_order_by_province union selectdt,recent_days,province_id,province_name,area_code,iso_code,iso_3166_2,order_count,order_amount from (select'$do_date' dt,recent_days,province_id,sum(order_count) order_count,sum(order_amount) order_amountfrom(selectrecent_days,province_id,casewhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,casewhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_amountfrom ${APP}.dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='$do_date')t1group by recent_days,province_id )t2 join ${APP}.dim_base_province t3 on t2.province_id=t3.id; "ads_order_spu_stats=" insert overwrite table ${APP}.ads_order_spu_stats select * from ${APP}.ads_order_spu_stats union select'$do_date' dt,recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name,sum(order_count),sum(order_amount) from (selectrecent_days,sku_id,casewhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,casewhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_amountfrom ${APP}.dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='$do_date' )t1 left join (selectid,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_namefrom ${APP}.dim_sku_infowhere dt='$do_date' )t2 on t1.sku_id=t2.id group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name; "ads_order_total=" insert overwrite table ${APP}.ads_order_total select * from ${APP}.ads_order_total union select'$do_date',recent_days,sum(order_count),sum(order_final_amount) order_final_amount,sum(if(order_final_amount>0,1,0)) order_user_count from (selectrecent_days,user_id,case when recent_days=0 then order_countwhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,case when recent_days=0 then order_final_amountwhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_final_amountfrom ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='$do_date' )t1 group by recent_days; "ads_page_path=" insert overwrite table ${APP}.ads_page_path select * from ${APP}.ads_page_path union select'$do_date',recent_days,source,target,count(*) from (selectrecent_days,concat('step-',step,':',source) source,concat('step-',step+1,':',target) targetfrom(selectrecent_days,page_id source,lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,row_number() over (partition by recent_days,session_id order by ts) stepfrom(selectrecent_days,last_page_id,page_id,ts,concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_idfrom ${APP}.dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('$do_date',-30)and dt>=date_add('$do_date',-recent_days+1))t2)t3 )t4 group by recent_days,source,target; "ads_repeat_purchase=" insert overwrite table ${APP}.ads_repeat_purchase select * from ${APP}.ads_repeat_purchase union select'$do_date' dt,recent_days,tm_id,tm_name,cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2)) from (selectrecent_days,user_id,tm_id,tm_name,sum(order_count) order_countfrom(selectrecent_days,user_id,sku_id,count(*) order_countfrom ${APP}.dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('$do_date',-29)and dt>=date_add('$do_date',-recent_days+1)group by recent_days, user_id,sku_id)t1left join(selectid,tm_id,tm_namefrom ${APP}.dim_sku_infowhere dt='$do_date')t2on t1.sku_id=t2.idgroup by recent_days,user_id,tm_id,tm_name )t3 group by recent_days,tm_id,tm_name; "ads_user_action=" with tmp_page as (select'$do_date' dt,recent_days,sum(if(array_contains(pages,'home'),1,0)) home_count,sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_countfrom(selectrecent_days,mid_id,collect_set(page_id) pagesfrom(selectdt,mid_id,page.page_idfrom ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as pagewhere dt>=date_add('$do_date',-29)and page.page_id in('home','good_detail'))t1 lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('$do_date',-recent_days+1)group by recent_days,mid_id)t2group by recent_days ), tmp_cop as (select'$do_date' dt,recent_days,sum(if(cart_count>0,1,0)) cart_count,sum(if(order_count>0,1,0)) order_count,sum(if(payment_count>0,1,0)) payment_countfrom(selectrecent_days,user_id,casewhen recent_days=1 then cart_last_1d_countwhen recent_days=7 then cart_last_7d_countwhen recent_days=30 then cart_last_30d_countend cart_count,casewhen recent_days=1 then order_last_1d_countwhen recent_days=7 then order_last_7d_countwhen recent_days=30 then order_last_30d_countend order_count,casewhen recent_days=1 then payment_last_1d_countwhen recent_days=7 then payment_last_7d_countwhen recent_days=30 then payment_last_30d_countend payment_countfrom ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt='$do_date')t1group by recent_days ) insert overwrite table ${APP}.ads_user_action select * from ${APP}.ads_user_action union selecttmp_page.dt,tmp_page.recent_days,home_count,good_detail_count,cart_count,order_count,payment_count from tmp_page join tmp_cop on tmp_page.recent_days=tmp_cop.recent_days; "ads_user_change=" insert overwrite table ${APP}.ads_user_change select * from ${APP}.ads_user_change union selectchurn.dt,user_churn_count,user_back_count from (select'$do_date' dt,count(*) user_churn_countfrom ${APP}.dwt_user_topicwhere dt='$do_date'and login_date_last=date_add('$do_date',-7) )churn join (select'$do_date' dt,count(*) user_back_countfrom(selectuser_id,login_date_lastfrom ${APP}.dwt_user_topicwhere dt='$do_date'and login_date_last='$do_date')t1join(selectuser_id,login_date_last login_date_previousfrom ${APP}.dwt_user_topicwhere dt=date_add('$do_date',-1))t2on t1.user_id=t2.user_idwhere datediff(login_date_last,login_date_previous)>=8 )back on churn.dt=back.dt; "ads_user_retention=" insert overwrite table ${APP}.ads_user_retention select * from ${APP}.ads_user_retention union select'$do_date',login_date_first create_date,datediff('$do_date',login_date_first) retention_day,sum(if(login_date_last='$do_date',1,0)) retention_count,count(*) new_user_count,cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate from ${APP}.dwt_user_topic where dt='$do_date' and login_date_first>=date_add('$do_date',-7) and login_date_first<'$do_date' group by login_date_first; "ads_user_total=" insert overwrite table ${APP}.ads_user_total select * from ${APP}.ads_user_total union select'$do_date',recent_days,sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,sum(order_final_amount) order_final_amount,sum(if(order_final_amount>0,1,0)) order_user_count,sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count from (selectrecent_days,user_id,login_date_first,login_date_last,order_date_first,case when recent_days=0 then order_final_amountwhen recent_days=1 then order_last_1d_final_amountwhen recent_days=7 then order_last_7d_final_amountwhen recent_days=30 then order_last_30d_final_amountend order_final_amount,if(recent_days=0,'1970-01-01',date_add('$do_date',-recent_days+1)) recent_days_agofrom ${APP}.dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_dayswhere dt='$do_date' )t1 group by recent_days; "ads_visit_stats=" insert overwrite table ${APP}.ads_visit_stats select * from ${APP}.ads_visit_stats union select'$do_date' dt,is_new,recent_days,channel,count(distinct(mid_id)) uv_count,cast(sum(duration)/1000 as bigint) duration_sec,cast(avg(duration)/1000 as bigint) avg_duration_sec,sum(page_count) page_count,cast(avg(page_count) as bigint) avg_page_count,count(*) sv_count,sum(if(page_count=1,1,0)) bounce_count,cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate from (selectsession_id,mid_id,is_new,recent_days,channel,count(*) page_count,sum(during_time) durationfrom(selectmid_id,channel,recent_days,is_new,last_page_id,page_id,during_time,concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_idfrom(selectmid_id,channel,last_page_id,page_id,during_time,ts,recent_days,if(visit_date_first>=date_add('$do_date',-recent_days+1),'1','0') is_newfrom(selectt1.mid_id,t1.channel,t1.last_page_id,t1.page_id,t1.during_time,t1.dt,t1.ts,t2.visit_date_firstfrom(selectmid_id,channel,last_page_id,page_id,during_time,dt,tsfrom ${APP}.dwd_page_logwhere dt>=date_add('$do_date',-30))t1left join(selectmid_id,visit_date_firstfrom ${APP}.dwt_visitor_topicwhere dt='$do_date')t2on t1.mid_id=t2.mid_id)t3 lateral view explode(Array(1,7,30)) tmp as recent_dayswhere dt>=date_add('$do_date',-recent_days+1))t4)t5group by session_id,mid_id,is_new,recent_days,channel )t6 group by is_new,recent_days,channel; "case $1 in"ads_activity_stats" )hive -e "$ads_activity_stats" ;;"ads_coupon_stats" )hive -e "$ads_coupon_stats";;"ads_order_by_province" )hive -e "$ads_order_by_province" ;;"ads_order_spu_stats" )hive -e "$ads_order_spu_stats" ;;"ads_order_total" )hive -e "$ads_order_total" ;;"ads_page_path" )hive -e "$ads_page_path" ;;"ads_repeat_purchase" )hive -e "$ads_repeat_purchase" ;;"ads_user_action" )hive -e "$ads_user_action" ;;"ads_user_change" )hive -e "$ads_user_change" ;;"ads_user_retention" )hive -e "$ads_user_retention" ;;"ads_user_total" )hive -e "$ads_user_total" ;;"ads_visit_stats" )hive -e "$ads_visit_stats" ;;"all" )hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_visit_stats";; esac
  • 權(quán)限:
  • [xiaobai@hadoop102 ~]$ chmod 777 dwt_to_ads.sh
  • 執(zhí)行:
  • [xiaobai@hadoop102 ~]$ dwt_to_ads.sh all 2020-06-14

    第10章 全流程調(diào)度

    10.1 Azkaban部署

    大數(shù)據(jù)之Azkaban部署戳這里==>

    10.2 創(chuàng)建MySQL數(shù)據(jù)庫和表

  • 創(chuàng)建gmall_report數(shù)據(jù)庫:

    或使用sql語句/
  • CREATE DATABASE `gmall_report` CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
  • 創(chuàng)建表:
    1). 訪客統(tǒng)計(jì)
  • DROP TABLE IF EXISTS ads_visit_stats; CREATE TABLE `ads_visit_stats` (`dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`is_new` VARCHAR(255) NOT NULL COMMENT '新老標(biāo)識,1:新,0:老',`recent_days` INT NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`channel` VARCHAR(255) NOT NULL COMMENT '渠道',`uv_count` BIGINT(20) DEFAULT NULL COMMENT '日活(訪問人數(shù))',`duration_sec` BIGINT(20) DEFAULT NULL COMMENT '頁面停留總時長',`avg_duration_sec` BIGINT(20) DEFAULT NULL COMMENT '一次會話,頁面停留平均時長',`page_count` BIGINT(20) DEFAULT NULL COMMENT '頁面總瀏覽數(shù)',`avg_page_count` BIGINT(20) DEFAULT NULL COMMENT '一次會話,頁面平均瀏覽數(shù)',`sv_count` BIGINT(20) DEFAULT NULL COMMENT '會話次數(shù)',`bounce_count` BIGINT(20) DEFAULT NULL COMMENT '跳出數(shù)',`bounce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '跳出率',PRIMARY KEY (`dt`,`recent_days`,`is_new`,`channel`) ) ENGINE=INNODB DEFAULT CHARSET=utf8;

    2). 頁面路徑分析

    DROP TABLE IF EXISTS ads_page_path; CREATE TABLE `ads_page_path` ( `dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`source` VARCHAR(255) DEFAULT NULL COMMENT '跳轉(zhuǎn)起始頁面',`target` VARCHAR(255) DEFAULT NULL COMMENT '跳轉(zhuǎn)終到頁面',`path_count` BIGINT(255) DEFAULT NULL COMMENT '跳轉(zhuǎn)次數(shù)',UNIQUE KEY (`dt`,`recent_days`,`source`,`target`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    3). 用戶統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_user_total; CREATE TABLE `ads_user_total` ( `dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),0:累積值,1:最近1天,7:最近7天,30:最近30天',`new_user_count` BIGINT(20) DEFAULT NULL COMMENT '新注冊用戶數(shù)',`new_order_user_count` BIGINT(20) DEFAULT NULL COMMENT '新增下單用戶數(shù)',`order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '下單總金額',`order_user_count` BIGINT(20) DEFAULT NULL COMMENT '下單用戶數(shù)',`no_order_user_count` BIGINT(20) DEFAULT NULL COMMENT '未下單用戶數(shù)(具體指活躍用戶中未下單用戶)',PRIMARY KEY (`dt`,`recent_days`) ) ENGINE=INNODB DEFAULT CHARSET=utf8;

    4). 用戶變動統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_user_change; CREATE TABLE `ads_user_change` (`dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`user_churn_count` BIGINT(20) DEFAULT NULL COMMENT '流失用戶數(shù)',`user_back_count` BIGINT(20) DEFAULT NULL COMMENT '回流用戶數(shù)',PRIMARY KEY (`dt`) ) ENGINE=INNODB DEFAULT CHARSET=utf8;

    5). 用戶行為漏斗分析

    DROP TABLE IF EXISTS ads_user_action; CREATE TABLE `ads_user_action` (`dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`home_count` BIGINT(20) DEFAULT NULL COMMENT '瀏覽首頁人數(shù)',`good_detail_count` BIGINT(20) DEFAULT NULL COMMENT '瀏覽商品詳情頁人數(shù)',`cart_count` BIGINT(20) DEFAULT NULL COMMENT '加入購物車人數(shù)',`order_count` BIGINT(20) DEFAULT NULL COMMENT '下單人數(shù)',`payment_count` BIGINT(20) DEFAULT NULL COMMENT '支付人數(shù)',PRIMARY KEY (`dt`,`recent_days`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    6). 用戶留存率分析

    DROP TABLE IF EXISTS ads_user_retention; CREATE TABLE `ads_user_retention` ( `dt` DATE DEFAULT NULL COMMENT '統(tǒng)計(jì)日期',`create_date` VARCHAR(255) NOT NULL COMMENT '用戶新增日期',`retention_day` BIGINT(20) NOT NULL COMMENT '截至當(dāng)前日期留存天數(shù)',`retention_count` BIGINT(20) DEFAULT NULL COMMENT '留存用戶數(shù)量',`new_user_count` BIGINT(20) DEFAULT NULL COMMENT '新增用戶數(shù)量',`retention_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '留存率',PRIMARY KEY (`create_date`,`retention_day`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    7). 訂單統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_order_total;CREATE TABLE `ads_order_total` ( `dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`order_count` BIGINT(255) DEFAULT NULL COMMENT '訂單數(shù)', `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '訂單金額', `order_user_count` BIGINT(255) DEFAULT NULL COMMENT '下單人數(shù)',PRIMARY KEY (`dt`,`recent_days`) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    8). 各省份訂單統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_order_by_province; CREATE TABLE `ads_order_by_province` (`dt` DATE NOT NULL,`recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`province_id` VARCHAR(255) NOT NULL COMMENT '統(tǒng)計(jì)日期',`province_name` VARCHAR(255) DEFAULT NULL COMMENT '省份名稱',`area_code` VARCHAR(255) DEFAULT NULL COMMENT '地區(qū)編碼',`iso_code` VARCHAR(255) DEFAULT NULL COMMENT '國際標(biāo)準(zhǔn)地區(qū)編碼',`iso_code_3166_2` VARCHAR(255) DEFAULT NULL COMMENT '國際標(biāo)準(zhǔn)地區(qū)編碼',`order_count` BIGINT(20) DEFAULT NULL COMMENT '訂單數(shù)',`order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '訂單金額',PRIMARY KEY (`dt`, `recent_days` ,`province_id`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    9). 品牌復(fù)購率

    DROP TABLE IF EXISTS ads_repeat_purchase; CREATE TABLE `ads_repeat_purchase` ( `dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`tm_id` VARCHAR(255) NOT NULL COMMENT '品牌ID',`tm_name` VARCHAR(255) DEFAULT NULL COMMENT '品牌名稱',`order_repeat_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '復(fù)購率',PRIMARY KEY (`dt` ,`recent_days`,`tm_id`) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    10). 商品統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_order_spu_stats; CREATE TABLE `ads_order_spu_stats` (`dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`recent_days` BIGINT(20) NOT NULL COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',`spu_id` VARCHAR(255) NOT NULL COMMENT '商品ID',`spu_name` VARCHAR(255) DEFAULT NULL COMMENT '商品名稱',`tm_id` VARCHAR(255) NOT NULL COMMENT '品牌ID',`tm_name` VARCHAR(255) DEFAULT NULL COMMENT '品牌名稱',`category3_id` VARCHAR(255) NOT NULL COMMENT '三級品類ID',`category3_name` VARCHAR(255) DEFAULT NULL COMMENT '三級品類名稱',`category2_id` VARCHAR(255) NOT NULL COMMENT '二級品類ID',`category2_name` VARCHAR(255) DEFAULT NULL COMMENT '二級品類名稱',`category1_id` VARCHAR(255) NOT NULL COMMENT '一級品類ID',`category1_name` VARCHAR(255) NOT NULL COMMENT '一級品類名稱',`order_count` BIGINT(20) DEFAULT NULL COMMENT '訂單數(shù)',`order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '訂單金額', PRIMARY KEY (`dt`,`recent_days`,`spu_id`) ) ENGINE=INNODB DEFAULT CHARSET=utf8;

    11). 活動統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_activity_stats; CREATE TABLE `ads_activity_stats` (`dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`activity_id` VARCHAR(255) NOT NULL COMMENT '活動ID',`activity_name` VARCHAR(255) DEFAULT NULL COMMENT '活動名稱',`start_date` DATE DEFAULT NULL COMMENT '開始日期',`order_count` BIGINT(11) DEFAULT NULL COMMENT '參與活動訂單數(shù)',`order_original_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '參與活動訂單原始金額',`order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '參與活動訂單最終金額',`reduce_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '優(yōu)惠金額',`reduce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '補(bǔ)貼率',PRIMARY KEY (`dt`,`activity_id` ) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    12). 優(yōu)惠券統(tǒng)計(jì)

    DROP TABLE IF EXISTS ads_coupon_stats; CREATE TABLE `ads_coupon_stats` (`dt` DATE NOT NULL COMMENT '統(tǒng)計(jì)日期',`coupon_id` VARCHAR(255) NOT NULL COMMENT '優(yōu)惠券ID',`coupon_name` VARCHAR(255) DEFAULT NULL COMMENT '優(yōu)惠券名稱',`start_date` DATE DEFAULT NULL COMMENT '開始日期', `rule_name` VARCHAR(200) DEFAULT NULL COMMENT '優(yōu)惠規(guī)則',`get_count` BIGINT(20) DEFAULT NULL COMMENT '領(lǐng)取次數(shù)',`order_count` BIGINT(20) DEFAULT NULL COMMENT '使用(下單)次數(shù)',`expire_count` BIGINT(20) DEFAULT NULL COMMENT '過期次數(shù)',`order_original_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '使用優(yōu)惠券訂單原始金額',`order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '使用優(yōu)惠券訂單最終金額',`reduce_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '優(yōu)惠金額',`reduce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '補(bǔ)貼率',PRIMARY KEY (`dt`,`coupon_id` ) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

    如圖,所有表已經(jīng)全部創(chuàng)建完成?

    10.3 Sqoop導(dǎo)出腳本

  • 編寫Sqoop導(dǎo)出腳本,在/home/xiaobai/bin目錄下創(chuàng)建腳本hdfs_to_mysql.sh
  • [xiaobai@hadoop102 bin]$ vim hdfs_to_mysql.sh #!/bin/bashhive_db_name=gmall mysql_db_name=gmall_reportexport_data() { /opt/module/sqoop/bin/sqoop export \ --connect "jdbc:mysql://hadoop102:3306/${mysql_db_name}?useUnicode=true&characterEncoding=utf-8" \ --username root \ --password ****** \ --table $1 \ --num-mappers 1 \ --export-dir /warehouse/$hive_db_name/ads/$1 \ --input-fields-terminated-by "\t" \ --update-mode allowinsert \ --update-key $2 \ --input-null-string '\\N' \ --input-null-non-string '\\N' }case $1 in"ads_activity_stats" )export_data "ads_activity_stats" "dt,activity_id";;"ads_coupon_stats" )export_data "ads_coupon_stats" "dt,coupon_id";;"ads_order_by_province" )export_data "ads_order_by_province" "dt,recent_days,province_id";;"ads_order_spu_stats" )export_data "ads_order_spu_stats" "dt,recent_days,spu_id";;"ads_order_total" )export_data "ads_order_total" "dt,recent_days";;"ads_page_path" )export_data "ads_page_path" "dt,recent_days,source,target";;"ads_repeat_purchase" )export_data "ads_repeat_purchase" "dt,recent_days,tm_id";;"ads_user_action" )export_data "ads_user_action" "dt,recent_days";;"ads_user_change" )export_data "ads_user_change" "dt";;"ads_user_retention" )export_data "ads_user_retention" "create_date,retention_day";;"ads_user_total" )export_data "ads_user_total" "dt,recent_days";;"ads_visit_stats" )export_data "ads_visit_stats" "dt,recent_days,is_new,channel";;"all" )export_data "ads_activity_stats" "dt,activity_id"export_data "ads_coupon_stats" "dt,coupon_id"export_data "ads_order_by_province" "dt,recent_days,province_id"export_data "ads_order_spu_stats" "dt,recent_days,spu_id"export_data "ads_order_total" "dt,recent_days"export_data "ads_page_path" "dt,recent_days,source,target"export_data "ads_repeat_purchase" "dt,recent_days,tm_id"export_data "ads_user_action" "dt,recent_days"export_data "ads_user_change" "dt"export_data "ads_user_retention" "create_date,retention_day"export_data "ads_user_total" "dt,recent_days"export_data "ads_visit_stats" "dt,recent_days,is_new,channel";; esac
  • 權(quán)限:
  • [xiaobai@hadoop102 bin]$ chmod 777 hdfs_to_mysql.sh
  • 執(zhí)行:
  • [xiaobai@hadoop102 bin]$ ./hdfs_to_mysql.sh all

    在MySQL查看執(zhí)行結(jié)果:

    tips:
    關(guān)于導(dǎo)出update還是insert的問題:
    –update-mode:
    updateonly:只更新,無法插入新數(shù)據(jù);
    allowinsert 允許新增 ;

    –update-key: 允許更新的情況下,指定哪些字段匹配視為同一條數(shù)據(jù),進(jìn)行更新而不增加;多個字段用逗號分隔。

    –input-null-string和–input-null-non-string:
    分別表示,將字符串列和非字符串列的空串和“null”轉(zhuǎn)義。

    Sqoop will by default import NULL values as string null. Hive is however using string \N to denote NULL values and therefore predicates dealing with NULL(like IS NULL) will not work correctly. You should append parameters --null-string and --null-non-string in case of import job or --input-null-string and --input-null-non-string in case of an export job if you wish to properly preserve NULL values. Because sqoop is using those parameters in generated code, you need to properly escape value \N to \\N:

    Hive中的Null在底層是以“\N”來存儲,而MySQL中的Null在底層就是Null,為了保證數(shù)據(jù)兩端的一致性。在導(dǎo)出數(shù)據(jù)時采用–input-null-string和–input-null-non-string兩個參數(shù)。導(dǎo)入數(shù)據(jù)時采用–null-string和–null-non-string。

    官網(wǎng)地址:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

    10.4 全調(diào)度流程

    10.4.1 數(shù)據(jù)準(zhǔn)備

    用戶行為數(shù)據(jù)準(zhǔn)備

    先開啟數(shù)據(jù)采集通道:

    [xiaobai@hadoop102 ~]$ f1.sh start ------啟動hadoop102采集flume------ ------啟動hadoop103采集flume------ [xiaobai@hadoop102 ~]$ f2.sh start--------啟動 hadoop104 消費(fèi)flume-------
  • 分別修改hadoop102 / hadoop103的/opt/module/applog下的application.yml:

  • 生成數(shù)據(jù):

  • [xiaobai@hadoop102 applog]$ lg.sh
  • 查看hdfs的/origin_data/gmall/log/topic_log/2020-06-15路徑下是否有數(shù)據(jù)生成:

    業(yè)務(wù)數(shù)據(jù)準(zhǔn)備

  • 修改hadoop102下的/opt/module/db_log/路徑下的application.properties:

  • [xiaobai@hadoop102 db_log]$ vim application.properties

  • 生成數(shù)據(jù):
  • [xiaobai@hadoop102 db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar
  • 查看order_infor表中operate_time中有2020-06-15日期的數(shù)據(jù):
  • 10.4.2 編寫Azkaban工作流程配置文件

  • 編寫azkaban.property文件:
  • azkaban-flow-version: 2.0
  • 編寫gmall.flow文件:
  • nodes:- name: mysql_to_hdfstype: commandconfig:command: /home/atguigu/bin/mysql_to_hdfs.sh all ${dt}- name: hdfs_to_ods_logtype: commandconfig:command: /home/atguigu/bin/hdfs_to_ods_log.sh ${dt}- name: hdfs_to_ods_dbtype: commanddependsOn: - mysql_to_hdfsconfig: command: /home/atguigu/bin/hdfs_to_ods_db.sh all ${dt}- name: ods_to_dim_dbtype: commanddependsOn: - hdfs_to_ods_dbconfig: command: /home/atguigu/bin/ods_to_dim_db.sh all ${dt}- name: ods_to_dwd_logtype: commanddependsOn: - hdfs_to_ods_logconfig: command: /home/atguigu/bin/ods_to_dwd_log.sh all ${dt}- name: ods_to_dwd_dbtype: commanddependsOn: - hdfs_to_ods_dbconfig: command: /home/atguigu/bin/ods_to_dwd_db.sh all ${dt}- name: dwd_to_dwstype: commanddependsOn:- ods_to_dim_db- ods_to_dwd_log- ods_to_dwd_dbconfig:command: /home/atguigu/bin/dwd_to_dws.sh all ${dt}- name: dws_to_dwttype: commanddependsOn:- dwd_to_dwsconfig:command: /home/atguigu/bin/dws_to_dwt.sh all ${dt}- name: dwt_to_adstype: commanddependsOn: - dws_to_dwtconfig:command: /home/atguigu/bin/dwt_to_ads.sh all ${dt}- name: hdfs_to_mysqltype: commanddependsOn:- dwt_to_adsconfig:command: /home/atguigu/bin/hdfs_to_mysql.sh all
  • 將azkaban.project、gmall.flow文件壓縮到一個zip文件gmall.zip(需是英文哦);
  • 在WebServer:http://hadoop102:8081/index 新建項(xiàng)目并上傳gmall.zip;
  • 點(diǎn)擊execute flow
  • 總結(jié)

    以上是生活随笔為你收集整理的数据仓库之电商数仓-- 3.4、电商数据仓库系统(ADS层)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。