日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hive-create table

發布時間:2023/12/8 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hive-create table 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

場景:因業務特殊需求,臨時需要創建一張表!!!

(1)表存儲格式是textfile(文本格式)

建表語句:

View Code
查看表結構:

CREATE TABLE test_1(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_1’;

(2)表存儲格式是lzo

建表語句:

View Code
查看表結構:

CREATE EXTERNAL TABLE test_2(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_2’;

(3)表存儲格式是orc

orc格式查看參考博客:https://www.cnblogs.com/lasclocker/p/5685941.html

建表語句:

View Code
查看表結構:

CREATE EXTERNAL TABLE test_3(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_3’;

匯總:

text:
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

lzo:
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

orc:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

(4)創建臨時表

第一:創建表并攜帶數據,只能創建內部表(不能創建外部表)–表結構不完全一致,除非自己指定

create table task_info_test as
select *
from task_info;
執行計劃:

View Code

具體執行日志:

hive> create table task_info_test as
> select *
> from task_info;
Query ID = mart_fro_20191002210314_9c25e306-9186-455e-a462-1d0cb08746e1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there’s no reduce operator
Start submit job !
Start GetSplits
GetSplits finish, it costs : 43 milliseconds
Submit job success : job_1533628320510_33970740
Starting Job = job_1533628320510_33970740, Tracking URL = http://BJHTYD-Hope-25-11.hadoop.jd.local:50320/proxy/application_1533628320510_33970740/
Kill Command = /data0/hadoop/hadoop_2.100.31_2019090518/bin/hadoop job -kill job_1533628320510_33970740
Hadoop job(job_1533628320510_33970740) information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-02 21:03:27,012 Stage-1(job_1533628320510_33970740) map = 0%, reduce = 0%
2019-10-02 21:03:44,581 Stage-1(job_1533628320510_33970740) map = 100%, reduce = 0%, Cumulative CPU 2.32 sec
MapReduce Total cumulative CPU time: 2 seconds 320 msec
Stage-1 Elapsed : 27094 ms job_1533628320510_33970740
Ended Job = job_1533628320510_33970740
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ns1012/tmp/mart_fro/mart_fro/hive/hive_hive_2019-10-02_21-03-14_179_6935514001098968121-1/-ext-10001
Moving data to: hdfs://ns1012/user/mart_fro/tmp.db/task_info_test
CounterStats: 獲取Counter信息用時: 3319 ms
Table tmp.task_info_test stats: [numFiles=1, numRows=6, totalSize=411, rawDataSize=405]
MapReduce Jobs Launched:
Stage-1: job_1533628320510_33970740 SUCCESS HDFS Read: 0.000 GB HDFS Write: 0.000 GB Elapsed : 27s94ms
Map: Total: 1 Success: 1 Killed: 0 Failed: 0 avgMapTime: 15s626ms
Reduce: Total: 0 Success: 0 Killed: 0 Failed: 0 avgReduceTime: 0ms avgShuffleTime: 0ms avgMergeTime: 0ms
JobHistory URL : http://BJHTYD-Hope-17-72.hadoop.jd.local:19888/jobhistory/job/job_1533628320510_33970740

Total MapReduce CPU Time Spent: 2s320ms
Total Map: 1 Total Reduce: 0
Total HDFS Read: 0.000 GB Written: 0.000 GB
OK
Time taken: 35.234 seconds
最后我們查看一下創建表的表結構:(可以看出,表的分隔符等不一致)

CREATE TABLE task_info_test(
task_id int,
task_name string,
task_parents array,
task_tags map<int,string>)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test’
TBLPROPERTIES (
‘COLUMN_STATS_ACCURATE’=‘true’,
‘mart_name’=‘mart_fro’,
‘numFiles’=‘1’,
‘numRows’=‘6’,
‘rawDataSize’=‘405’,
‘totalSize’=‘411’,
‘transient_lastDdlTime’=‘1570021429’)
Time taken: 0.034 seconds, Fetched: 21 row(s)
根據執行計劃,我們總結一下創建這個臨時表的過程:

(1)查找數據,放在臨時目錄下面:

hdfs://ns1012/tmp/mart_fro/mart_fro/hive/hive_hive_2019-10-02_21-03-14_179_6935514001098968121-1/-ext-10001

(2)mv數據,放到表對應的目錄下面:

hdfs://ns1012/user/mart_fro/tmp.db/task_info_test

(3)創建臨時表(hdfs路徑正好已經創建好了,所以可以直接使用)

注意:創建表攜帶數據,如果你創建的是外部表,會報錯:

explain
create external table task_info_test as
select *
from task_info;
FAILED: SemanticException [Error 10070]: CREATE-TABLE-AS-SELECT cannot create external table

第二:創建表不攜帶數據,可以創建內部表,也可以創建外部表–表結構一致

sql1:

create table task_info_test_2 like task_info;
執行計劃:

hive> explain create table task_info_test_2 like task_info;
OK
STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Create Table Operator:
Create Table
default input format: org.apache.hadoop.mapred.TextInputFormat
default output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
default serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
like: task_info
name: tmp.task_info_test_2
查看目標表結構:(從目標表結構可以看出,除了location不一樣之外,其余都一樣)

hive> show create table task_info_test_2;
OK
CREATE TABLE task_info_test_2(
task_id int COMMENT ‘任務id’,
task_name string COMMENT ‘任務名稱’,
task_parents array COMMENT ‘父任務id’,
task_tags map<int,string> COMMENT ‘任務關聯的標簽信息’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test_2’;

sql2:

create external table task_info_test_1 like task_info;
執行計劃:

STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Create Table Operator:
Create Table
default input format: org.apache.hadoop.mapred.TextInputFormat
default output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
default serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
like: task_info
name: tmp.task_info_test_1
isExternal: true
查看目標表結構:(從目標表結構可以看出,除了location不一樣之外,其余都一樣)

hive> show create table task_info_test_1;
OK
CREATE EXTERNAL TABLE task_info_test_1(
task_id int COMMENT ‘任務id’,
task_name string COMMENT ‘任務名稱’,
task_parents array COMMENT ‘父任務id’,
task_tags map<int,string> COMMENT ‘任務關聯的標簽信息’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test_1’
TBLPROPERTIES (
‘mart_name’=‘mart_fro’,
‘transient_lastDdlTime’=‘1570023418’)

上面介紹了那么多,現實中是這樣用的:

(1)先通過create table like方式創建一張表結構一樣的臨時表(可以是內部表,也可以是外部表)

(2)在通過insert overwrite方式向臨時表當中導入數據

insert overwrite table task_info_test
select *
from task_info;
最終的數據都會放在hdfs://…//臨時表名字/這個路徑下面!!!!

總結

以上是生活随笔為你收集整理的hive-create table的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。