1. 建表
create table dim_province(
province_id string comment '省份ID',
province_name string comment '省份名字'
)
row format delimited fields terminated by ','
stored as textfile;load data local inpath '/home/hadoop/data/dim_province.txt' into table dim_province;create table dim_city(
province_id string comment '省份ID',
city_id string comment '城市ID',
city_name string comment '城市名字'
)
row format delimited fields terminated by ','
stored as textfile;load data local inpath '/home/hadoop/data/dim_city.txt' into table dim_city;create table dw_user_click_d(
day date comment '日期',
user_id string comment '用戶ID',
province_id string comment '省份ID',
city_id string comment '城市ID',
flow bigint comment '流量',
os string comment '操作系統(tǒng)',
pv bigint comment '頁面訪問量'
)
row format delimited fields terminated by ','
stored as textfile;load data local inpath '/home/hadoop/data/dw_user_click_d.txt' into table dw_user_click_d;
4. 數(shù)據(jù)建模
4.1 創(chuàng)建project
4.2 導入數(shù)據(jù)
load table from tree
4.3 創(chuàng)建model
先有數(shù)據(jù)源,再有model, model建立在datasource上
現(xiàn)有model 再有cube,cube 建立在model上得
Data Source -- model -- cube
1. model info 創(chuàng)建
3. 選擇 事實表: dw_user_click_d分別去join兩張維度表 , 建立起星型模型
3. 維度展示
4. 度量指標
5. setting
4.4 創(chuàng)建cube
KYLIN 去重方式有兩種 bitmap 精準去重 hyperloglog 誤差去重
4.5 構(gòu)建cube
cube 界面點擊action action – build – 時間范圍選擇
monitor可以查看進度 手動web ui 構(gòu)建cube 手動輸入start data 和 end date cube得自動調(diào)度 rest api 地址:http://kylin.apache.org/docs/howto/howto_use_restapi.html#list-cubes
List cubes http://hadoop003:7070/kylin/api/cubes/hpznyf_user_click_cube
hpznyf_user_click_cube http://hadoop001:7070/kylin/api/cube_desc/hpznyf_user_click_cubeBuild cubePUT /kylin/api/cubes/{cubeName}/buildPath VariablecubeName - required string Cube name.Request BodystartTime - required long Start timestamp of data to build, e.g. 1388563200000 for 2014-1-1endTime - required long End timestamp of data to buildbuildType - required string Supported build type: ‘BUILD’, ‘MERGE’, ‘REFRESH’Curl Examplecurl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/json' -d '{"startTime":'1423526400000', "endTime":'1423612800000', "buildType":"BUILD"}' http://<host>:<port>/kylin/api/cubes/{cubeName}/build直接執(zhí)行報錯:[hadoop@hadoop003 script]$ curl -X PUT --user ADMIN:KYLIN -H 'Content-Type: application/json' -d '{"startTime":'$start_day', "endTime":'$end_day', "buildType":"BUILD"}' http://hadoop003:7070/kylin/api/cubes/$v_cubename/build
查詢cube:展示數(shù)據(jù):日期 省份 城市 pv uvselect"DAY",
PROVINCE_NAME,
CITY_NAME,
sum(PV),
count(distinct USER_ID)
from DW_USER_CLICK_D as user_click
join DIM_PROVINCE as province on province.PROVINCE_ID=user_click.PROVINCE_ID
join DIM_CITY as city on city.CITY_ID=user_click.CITY_ID
group by "DAY",PROVINCE_NAME,CITY_NAME
%kylin
select
"DAY",
PROVINCE_NAME,
CITY_NAME,
sum(PV),
count(distinct USER_ID)
from DW_USER_CLICK_D as user_click
join DIM_PROVINCE as province on province.PROVINCE_ID=user_click.PROVINCE_ID
join DIM_CITY as city on city.CITY_ID=user_click.CITY_ID
group by "DAY",PROVINCE_NAME,CITY_NAME
運行查詢就報錯: Failed : HTTP error code 500 修改為: %kylin select \"DAY\", PROVINCE_NAME, CITY_NAME, sum(PV) as \"PV\", count(distinct USER_ID) as \"UV\" from DW_USER_CLICK_D as user_click join DIM_PROVINCE as province on province.PROVINCE_ID=user_click.PROVINCE_ID join DIM_CITY as city on city.CITY_ID=user_click.CITY_ID group by \"DAY\",PROVINCE_NAME,CITY_NAME 注意點:在配置kylin sql時,對于""需要進行轉(zhuǎn)義,否則是查詢不出來數(shù)據(jù)的 4.Run note with cron scheduler. 配置定時的數(shù)據(jù)刷新策略