日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

数仓项目一览

發(fā)布時間:2024/1/1 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 数仓项目一览 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

數(shù)倉項目一覽

1. 項目架構(gòu)

2. 需求分析

  • 了解數(shù)據(jù)
  • 維度表 dim_city.txtbj,bj01,朝陽 bj,bj02,海淀 js,js01,南京 js,js02,宿遷 zj,zj01,杭州 zj,zj02,嘉興 sh,sh01,徐匯 sh,sh02,虹口 gz,gz01,廣州 gz,gz02,海珠33餓 維度表 dim_province.txt bj,北京 js,江蘇 zj,浙江 sh,上海 gz,廣州事實表 dw_user_click_d.txt 清洗過后得用戶行為點擊表 day user_id province_id city_id flow os pv pv:頁面訪問量,或者說是點擊量 用戶去訪問www.ruozedata.com 在今天訪問了幾次 count() sum()uv:unique visitor 獨立的訪客 是需要進行去重的 今天1天www.ruozedata.com訪問該網(wǎng)站的人數(shù)有多少(是需要進行去重的) count(distinct)2020-04-04,1QJK6U3V94KBO5HEF8,gz,gz01,4210,Android,7 2020-04-11,PBWQC2T3L60ZYOHYTY,sh,sh01,8273,Android,2 2020-04-02,FWSNWBSNNLF347BORU,gz,gz01,2749,Android,7 2020-04-09,P715SX31XFNR7B3CM8,sh,sh01,189,Mac OS,9 2020-04-25,UT85MZURQT4FTZQ5KU,sh,sh01,8176,Mac OS,3 2020-04-05,S3DW8JL5G5NU5UN6I4,zj,zj01,1927,Android,9 2020-04-18,F41NSFYSF3CDJQNH8R,bj,bj01,9132,Android,7 2020-04-12,G6LZMMZBGWO4YSJW6X,sh,sh01,6733,Android,9 2020-04-23,58ZRRKRPCXPV0P3RK6,gz,gz01,7876,Android,5 2020-04-19,USZYXFMODFQ4Q9RP2C,gz,gz01,7702,Android,1 2020-04-27,GFNUCJ8TN9E0I1K56M,gz,gz01,5026,Android,2 2020-04-04,EPFD3I6HNOD9ZTBVJQ,gz,gz01,4963,Mac OS,5 2020-04-22,6KJKD01KVUEN9UO652,gz,gz01,2166,Mac OS,6 2020-04-25,57W2J6CKFL1Y345E3L,bj,bj01,6900,Mac OS,9 2020-04-09,0VRRDG4ZFKPXWE9JIZ,zj,zj01,2250,Mac OS,4 2020-04-17,POEYZVCWXS66N6L44V,sh,sh01,1843,Android,5 2020-04-27,Q1YSVRX8WGEYUL1C9N,sh,sh01,6886,Android,8 2020-04-04,TYBW7SP3C1VJ3G1X7M,bj,bj01,7922,Android,7 2020-04-09,Y97X1SVNMI48ST4YLY,gz,gz01,3146,Mac OS,6 2020-04-15,9WFC8KZFU83521P331,sh,sh01,2390,Android,9 2020-04-17,P1BL3NFXWFBC1POL88,bj,bj01,9838,Android,1 2020-04-24,XHQ0OWRXZY92URV96N,bj,bj01,9190,Android,9 2020-04-01,CD1MPF4GH4H66CCVUW,zj,zj01,2312,Mac OS,6 2020-04-02,XTYZ8G99GNKHTNVKZP,sh,sh01,6048,Mac OS,4 2020-04-03,C13XSESM5P5MXIUTP8,zj,zj01,3032,Mac OS,6 2020-04-24,MTQ3YNFW0IIS8LNX1L,gz,gz01,1554,Mac OS,9 2020-04-01,HYWBBQED30LVEJOQ5R,bj,bj01,6303,Mac OS,2 2020-04-12,HD6XHZ2HMZ4DQM90JO,bj,bj01,4871,Android,3 2020-04-21,46G7Z6Y7LI9HUV1O56,zj,zj01,6812,Mac OS,5 2020-04-09,D09Q5T1BV0FT18PRDG,sh,sh01,352,Android,6 2020-04-06,GPQBZKI2K28L615DUR,sh,sh01,5891,Mac OS,7 2020-04-18,M7WX6KLWB4T6YDGNK3,sh,sh01,9265,Mac OS,6 2020-04-17,379B6K879718ROC8ZI,gz,gz01,4868,Android,4

    2張維度表來源于mysql業(yè)務庫 日志表源于采集系統(tǒng)采集到hive得數(shù)據(jù) 經(jīng)過etl之后得匯總數(shù)據(jù)

    2.最終需要展示的數(shù)據(jù)指標/報表形式/可視化的形式
    日期 省份 城市 pv uv

    3. 數(shù)據(jù)源導入Hive數(shù)倉

    1. 建表 create table dim_province( province_id string comment '省份ID', province_name string comment '省份名字' ) row format delimited fields terminated by ',' stored as textfile;load data local inpath '/home/hadoop/data/dim_province.txt' into table dim_province;create table dim_city( province_id string comment '省份ID', city_id string comment '城市ID', city_name string comment '城市名字' ) row format delimited fields terminated by ',' stored as textfile;load data local inpath '/home/hadoop/data/dim_city.txt' into table dim_city;create table dw_user_click_d( day date comment '日期', user_id string comment '用戶ID', province_id string comment '省份ID', city_id string comment '城市ID', flow bigint comment '流量', os string comment '操作系統(tǒng)', pv bigint comment '頁面訪問量' ) row format delimited fields terminated by ',' stored as textfile;load data local inpath '/home/hadoop/data/dw_user_click_d.txt' into table dw_user_click_d;

    4. 數(shù)據(jù)建模

    4.1 創(chuàng)建project

    4.2 導入數(shù)據(jù)

    load table from tree

    4.3 創(chuàng)建model

    先有數(shù)據(jù)源,再有model, model建立在datasource上 現(xiàn)有model 再有cube,cube 建立在model上得 Data Source -- model -- cube 1. model info 創(chuàng)建

    3. 選擇 事實表: dw_user_click_d分別去join兩張維度表 , 建立起星型模型



    3. 維度展示

    4. 度量指標


    5. setting

    4.4 創(chuàng)建cube




    KYLIN 去重方式有兩種
    bitmap 精準去重
    hyperloglog 誤差去重

    4.5 構(gòu)建cube

  • cube 界面點擊action
    action – build – 時間范圍選擇
  • monitor可以查看進度
    手動web ui 構(gòu)建cube 手動輸入start data 和 end date
    cube得自動調(diào)度
    rest api 地址:http://kylin.apache.org/docs/howto/howto_use_restapi.html#list-cubes
  • List cubes http://hadoop003:7070/kylin/api/cubes/hpznyf_user_click_cube hpznyf_user_click_cube http://hadoop001:7070/kylin/api/cube_desc/hpznyf_user_click_cubeBuild cubePUT /kylin/api/cubes/{cubeName}/buildPath VariablecubeName - required string Cube name.Request BodystartTime - required long Start timestamp of data to build, e.g. 1388563200000 for 2014-1-1endTime - required long End timestamp of data to buildbuildType - required string Supported build type: ‘BUILD’, ‘MERGE’, ‘REFRESH’Curl Examplecurl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/json' -d '{"startTime":'1423526400000', "endTime":'1423612800000', "buildType":"BUILD"}' http://<host>:<port>/kylin/api/cubes/{cubeName}/build直接執(zhí)行報錯:[hadoop@hadoop003 script]$ curl -X PUT --user ADMIN:KYLIN -H 'Content-Type: application/json' -d '{"startTime":'$start_day', "endTime":'$end_day', "buildType":"BUILD"}' http://hadoop003:7070/kylin/api/cubes/$v_cubename/build 查詢cube:展示數(shù)據(jù):日期 省份 城市 pv uvselect"DAY", PROVINCE_NAME, CITY_NAME, sum(PV), count(distinct USER_ID) from DW_USER_CLICK_D as user_click join DIM_PROVINCE as province on province.PROVINCE_ID=user_click.PROVINCE_ID join DIM_CITY as city on city.CITY_ID=user_click.CITY_ID group by "DAY",PROVINCE_NAME,CITY_NAME

    4. zeeplin

    安裝直接解壓 http://zeppelin.apache.org/download.html
    如果端口有問題,需要修改conf內(nèi)得zeplin-site
    1.create notebook
    ruozedata_kylin
    Interpreter:kylin
    2.Interpreters界面 搜索 kylin 配置對應的內(nèi)容
    kylin.query.project learn_kylin(默認) 我們需要修改為:ruozedata_kylin

    使用
    文檔:http://zeppelin.apache.org/docs/0.7.3/interpreter/kylin.html

    %kylin select "DAY", PROVINCE_NAME, CITY_NAME, sum(PV), count(distinct USER_ID) from DW_USER_CLICK_D as user_click join DIM_PROVINCE as province on province.PROVINCE_ID=user_click.PROVINCE_ID join DIM_CITY as city on city.CITY_ID=user_click.CITY_ID group by "DAY",PROVINCE_NAME,CITY_NAME

    運行查詢就報錯:
    Failed : HTTP error code 500
    修改為:
    %kylin select \"DAY\", PROVINCE_NAME, CITY_NAME, sum(PV) as \"PV\", count(distinct USER_ID) as \"UV\" from DW_USER_CLICK_D as user_click join DIM_PROVINCE as province on province.PROVINCE_ID=user_click.PROVINCE_ID join DIM_CITY as city on city.CITY_ID=user_click.CITY_ID group by \"DAY\",PROVINCE_NAME,CITY_NAME
    注意點:在配置kylin sql時,對于""需要進行轉(zhuǎn)義,否則是查詢不出來數(shù)據(jù)的
    4.Run note with cron scheduler.
    配置定時的數(shù)據(jù)刷新策略

    總結(jié)

    以上是生活随笔為你收集整理的数仓项目一览的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。