hive数据仓库建设
生活随笔
收集整理的這篇文章主要介紹了
hive数据仓库建设
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
hive數據倉庫建設
1、設計原生日志表
原生日志表用來存放上報的原始日志,數據經過清洗加工后會進入到各個日志表中。
1.1 創建數據庫
#創建數據庫 $hive>create database umeng_big11 ;1.2 創建原生日志表
原生表使用分區表設計,分區字段為ym/d/hm,hive使用動態分區表,分區采用非嚴格模式,即所有分區都可以是動態分區。hive命令行終端打開顯式表頭設置:
#臨時設置,只在當前回話有效 $hive>set hive.cli.print.header=true ;永久配置hive-site.xml:
... <property><name>hive.cli.print.header</name><value>true</value> </property> ...創建hive原生日志表raw_logs:
$hive>create table raw_logs (servertimems float ,servertimestr string , clientip string ,clienttimems bigint,status int ,log string ) PARTITIONED BY (ym int, day int , hm int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '#' LINES TERMINATED BY '\n' STORED AS TEXTFILE;1.3 加載hdfs數據到hive原生表
$hive>use umeng_big11 ; $hive>load data inpath '/user/centos/umeng/raw-logs/201805/24/1809' into table raw_logs partition(ym=201805 , day = 24 , hm = 1809) ;2、自定義UDF函數完成數據清洗
2.1 介紹
將原生數據加載原生日志表后,將status碼為200的記錄查詢出來進行清洗,將結果分別插入到5類日志子表中。供以后分析使用。
3、創建日志子表
3.1 準備sql語句
創建/home/centos/umeng/umeng_create_logs_ddl.sql文件,內容如下:
--使用指定庫 use umeng_big11 ;--startuplogs create table if not exists startuplogs (appChannel string ,appId string ,appPlatform string ,appVersion string ,brand string ,carrier string ,country string ,createdAtMs bigint ,deviceId string ,deviceStyle string ,ipAddress string ,network string ,osType string ,province string ,screenSize string ,tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --eventlogs create table if not exists eventlogs (appChannel string ,appId string ,appPlatform string ,appVersion string ,createdAtMs bigint ,deviceId string ,deviceStyle string ,eventDurationSecs bigint ,eventId string ,osType string ,tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --errorlogs create table if not exists errorlogs (appChannel string ,appId string ,appPlatform string ,appVersion string ,createdAtMs bigint ,deviceId string ,deviceStyle string ,errorBrief string ,errorDetail string ,osType string ,tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --usagelogs create table if not exists usgaelogs (appChannel string ,appId string ,appPlatform string ,appVersion string ,createdAtMs bigint ,deviceId string ,deviceStyle string ,osType string ,singleDownloadTraffic bigint ,singleUploadTraffic bigint ,singleUseDurationSecs bigint ,tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --pagelogs create table if not exists pagelogs (appChannel string ,appId string ,appPlatform string ,appVersion string ,createdAtMs bigint ,deviceId string ,deviceStyle string ,nextPage string ,osType string ,pageId string ,pageViewCntInSession int ,stayDurationSecs bigint ,tenantId string ,visitIndex int ) partitioned by (ym int ,day int , hm int) stored as parquet ;3.2 執行sql腳本
$hive>source /home/centos/umeng/umeng_create_logs_ddl.sql轉載于:https://www.cnblogs.com/xupccc/p/9545701.html
總結
以上是生活随笔為你收集整理的hive数据仓库建设的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 栈——括弧匹配检验
- 下一篇: vue.js 深度监测