當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

hive-create table

發布時間：2023/12/8 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 hive-create table 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

場景：因業務特殊需求，臨時需要創建一張表！！！

（1）表存儲格式是textfile（文本格式）

建表語句：

View Code
查看表結構：

CREATE TABLE test_1(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_1’;

（2）表存儲格式是lzo

建表語句：

View Code
查看表結構：

CREATE EXTERNAL TABLE test_2(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_2’;

（3）表存儲格式是orc

orc格式查看參考博客：https://www.cnblogs.com/lasclocker/p/5685941.html

建表語句：

View Code
查看表結構：

CREATE EXTERNAL TABLE test_3(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_3’;

匯總：

text:
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

lzo:
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

orc:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

（4）創建臨時表

第一：創建表并攜帶數據,只能創建內部表(不能創建外部表)–表結構不完全一致，除非自己指定

create table task_info_test as
select *
from task_info;
執行計劃：

View Code

具體執行日志：

hive> create table task_info_test as
> select *
> from task_info;
Query ID = mart_fro_20191002210314_9c25e306-9186-455e-a462-1d0cb08746e1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there’s no reduce operator
Start submit job !
Start GetSplits
GetSplits finish, it costs : 43 milliseconds
Submit job success : job_1533628320510_33970740
Starting Job = job_1533628320510_33970740, Tracking URL = http://BJHTYD-Hope-25-11.hadoop.jd.local:50320/proxy/application_1533628320510_33970740/
Kill Command = /data0/hadoop/hadoop_2.100.31_2019090518/bin/hadoop job -kill job_1533628320510_33970740
Hadoop job(job_1533628320510_33970740) information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-02 21:03:27,012 Stage-1(job_1533628320510_33970740) map = 0%, reduce = 0%
2019-10-02 21:03:44,581 Stage-1(job_1533628320510_33970740) map = 100%, reduce = 0%, Cumulative CPU 2.32 sec
MapReduce Total cumulative CPU time: 2 seconds 320 msec
Stage-1 Elapsed : 27094 ms job_1533628320510_33970740
Ended Job = job_1533628320510_33970740
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ns1012/tmp/mart_fro/mart_fro/hive/hive_hive_2019-10-02_21-03-14_179_6935514001098968121-1/-ext-10001
Moving data to: hdfs://ns1012/user/mart_fro/tmp.db/task_info_test
CounterStats: 獲取Counter信息用時: 3319 ms
Table tmp.task_info_test stats: [numFiles=1, numRows=6, totalSize=411, rawDataSize=405]
MapReduce Jobs Launched:
Stage-1: job_1533628320510_33970740 SUCCESS HDFS Read: 0.000 GB HDFS Write: 0.000 GB Elapsed : 27s94ms
Map: Total: 1 Success: 1 Killed: 0 Failed: 0 avgMapTime: 15s626ms
Reduce: Total: 0 Success: 0 Killed: 0 Failed: 0 avgReduceTime: 0ms avgShuffleTime: 0ms avgMergeTime: 0ms
JobHistory URL : http://BJHTYD-Hope-17-72.hadoop.jd.local:19888/jobhistory/job/job_1533628320510_33970740

Total MapReduce CPU Time Spent: 2s320ms
Total Map: 1 Total Reduce: 0
Total HDFS Read: 0.000 GB Written: 0.000 GB
OK
Time taken: 35.234 seconds
最后我們查看一下創建表的表結構：（可以看出，表的分隔符等不一致）

CREATE TABLE task_info_test(
task_id int,
task_name string,
task_parents array,
task_tags map<int,string>)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test’
TBLPROPERTIES (
‘COLUMN_STATS_ACCURATE’=‘true’,
‘mart_name’=‘mart_fro’,
‘numFiles’=‘1’,
‘numRows’=‘6’,
‘rawDataSize’=‘405’,
‘totalSize’=‘411’,
‘transient_lastDdlTime’=‘1570021429’)
Time taken: 0.034 seconds, Fetched: 21 row(s)
根據執行計劃，我們總結一下創建這個臨時表的過程：

（1）查找數據，放在臨時目錄下面：

hdfs://ns1012/tmp/mart_fro/mart_fro/hive/hive_hive_2019-10-02_21-03-14_179_6935514001098968121-1/-ext-10001

（2）mv數據，放到表對應的目錄下面：

hdfs://ns1012/user/mart_fro/tmp.db/task_info_test

（3）創建臨時表（hdfs路徑正好已經創建好了，所以可以直接使用）

注意：創建表攜帶數據，如果你創建的是外部表，會報錯：

explain
create external table task_info_test as
select *
from task_info;
FAILED: SemanticException [Error 10070]: CREATE-TABLE-AS-SELECT cannot create external table

第二：創建表不攜帶數據，可以創建內部表，也可以創建外部表–表結構一致

sql1：

create table task_info_test_2 like task_info;
執行計劃：

hive> explain create table task_info_test_2 like task_info;
OK
STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Create Table Operator:
Create Table
default input format: org.apache.hadoop.mapred.TextInputFormat
default output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
default serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
like: task_info
name: tmp.task_info_test_2
查看目標表結構：（從目標表結構可以看出，除了location不一樣之外，其余都一樣）

hive> show create table task_info_test_2;
OK
CREATE TABLE task_info_test_2(
task_id int COMMENT ‘任務id’,
task_name string COMMENT ‘任務名稱’,
task_parents array COMMENT ‘父任務id’,
task_tags map<int,string> COMMENT ‘任務關聯的標簽信息’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test_2’;

sql2：

create external table task_info_test_1 like task_info;
執行計劃：

STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Create Table Operator:
Create Table
default input format: org.apache.hadoop.mapred.TextInputFormat
default output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
default serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
like: task_info
name: tmp.task_info_test_1
isExternal: true
查看目標表結構：（從目標表結構可以看出，除了location不一樣之外，其余都一樣）

hive> show create table task_info_test_1;
OK
CREATE EXTERNAL TABLE task_info_test_1(
task_id int COMMENT ‘任務id’,
task_name string COMMENT ‘任務名稱’,
task_parents array COMMENT ‘父任務id’,
task_tags map<int,string> COMMENT ‘任務關聯的標簽信息’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test_1’
TBLPROPERTIES (
‘mart_name’=‘mart_fro’,
‘transient_lastDdlTime’=‘1570023418’)

上面介紹了那么多，現實中是這樣用的：

（1）先通過create table like方式創建一張表結構一樣的臨時表（可以是內部表，也可以是外部表）

（2）在通過insert overwrite方式向臨時表當中導入數據

insert overwrite table task_info_test
select *
from task_info;
最終的數據都會放在hdfs://…//臨時表名字/這個路徑下面！！！！

總結

以上是生活随笔為你收集整理的hive-create table的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Java数据结构与算法_线性表_顺序表与
下一篇： Storyboard Animation