當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【Hive】Hive表数据的导入导出

發布時間：2023/12/9 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了【Hive】Hive表数据的导入导出小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

一、Hive 中數據的導入
- 1、本地文件系統導入 Hive 表
- 2、Hdfs 導入Hive
- 3、查詢結果導入 Hive
- 4、創建表時將查詢結果導入 Hive
二、Hive 中數據的導出
- 1、導出到本地文件系統
- 2、導出到 Hdfs
- 3、導出到 Hive表

環境準備

Hadoop 完全分布式（一主兩從即可）
MySQL環境、Hive環境

一、Hive 中數據的導入

1、本地文件系統導入 Hive 表

首先，在 Hive 中創建一個 cat_group 表，包含 group_id 和 group_name 字段，字符類型為 string，以 ‘\t’ 為分隔符：

hive (db)> create table if not exists cat_group(group_id string ,group_name string)> row format delimited fields terminated by '\t'> stored as textfile; OK Time taken: 0.156 secondshive (db)> show tables; OK cat cat3 cat_group Time taken: 0.021 seconds, Fetched: 3 row(s)

[row format delimited]關鍵字，是用來設置創建的表在加載數據的時候，支持的列分隔符。
[stored as textfile]關鍵字，是用來設置加載數據的數據類型，默認是TEXTFILE,如果文件數據是純文本，就是使用[stored as textfile],然后從本地直接拷貝到HDFS上，Hive直接可以識別數據。

將 linux 本地 /home/data/hive-data 目錄下的 cat_group 文件導入到 Hive 中的 cat_group 表中：

hive (db)> load data local inpath '/../home/data/hive-data/cat_group' into table cat_group; Loading data to table db.cat_group OK Time taken: 2.097 seconds

通過 select...from ... limit 語句查詢前10條記錄：

hive (db)> select * from cat_group limit 10; OK 501 有機食品 502 蔬菜水果 503 肉禽蛋奶 504 深海水產 505 地方特產 506 進口食品 507 營養保健 508 休閑零食 509 酒水茶飲 510 糧油副食 Time taken: 0.162 seconds, Fetched: 10 row(s)

返回頂部

2、Hdfs 導入Hive

首先，在 hdfs 上創建 data/hive 目錄：

[root@server hive-data]# hdfs dfs -mkdir -p /data/hive

然后將 cat_group 文件上傳至 hive 目錄下：

[root@server hive-data]# hdfs dfs -put /../home/data/hive-data/cat_group /data/hive [root@server hive-data]# hdfs dfs -ls /data/hive Found 1 items -rw-r--r-- 3 root supergroup 2164 2022-03-06 11:07 /data/hive/cat_group

在 Hive 中創建 cat_group1 表:

hive (db)> create table if not exists cat_group1(group_id string ,group_name string)> row format delimited fields terminated by '\t'> stored as textfile; OK Time taken: 0.156 secondshive (db)> show tables; OK cat cat3 cat_group cat_group1 Time taken: 0.021 seconds, Fetched: 3 row(s)

將 hdfs 中 data/hive 目錄下的 cat_group 文件數據導入到 cat_group1 表中：

// 提示：hdfs數據導入的時候不用加 local hive (db)> load data inpath '/data/hive/cat_group' into table cat_group1; Loading data to table db.cat_group1 OK Time taken: 0.539 seconds hive (db)> select * from cat_group1 limit 10; OK 501 有機食品 502 蔬菜水果 503 肉禽蛋奶 504 深海水產 505 地方特產 506 進口食品 507 營養保健 508 休閑零食 509 酒水茶飲 510 糧油副食 Time taken: 0.107 seconds, Fetched: 10 row(s)

值得注意的是，此時存在 /data/hive 中的數據文件轉移到了 /user/hive/warehouse/db.db/cat_group1 文件目錄下了：

返回頂部

3、查詢結果導入 Hive

首先，在 Hive 中創建 cat_group2 表：

hive (db)> create table if not exists cat_group2(group_id string ,group_name string)> row format delimited fields terminated by '\t'> stored as textfile; OK Time taken: 0.156 secondshive (db)> show tables; OK cat cat3 cat_group cat_group1 cat_group2 Time taken: 0.016 seconds, Fetched: 3 row(s)

兩種方式將 cat_group1 表中的數據導入到 cat_group2 表中：

// 直接導入 hive (db)> insert into table cat_group2 select * from cat_group1; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20220306111859_2bd20950-a787-4a76-8f2d-415dd3517c32 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1646527355398_0001, Tracking URL = http://server:8088/proxy/application_1646527355398_0001/ Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1646527355398_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2022-03-06 11:21:23,475 Stage-1 map = 0%, reduce = 0% 2022-03-06 11:21:30,327 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.02 sec MapReduce Total cumulative CPU time: 1 seconds 20 msec Ended Job = job_1646527355398_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/cat_group2/.hive-staging_hive_2022-03-06_11-18-59_462_4910696310996519402-1/-ext-10000 Loading data to table db.cat_group2 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.02 sec HDFS Read: 6128 HDFS Write: 1751 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 20 msec OK Time taken: 153.838 seconds // 方式二：覆蓋導入 hive (db)> insert overwrite table cat_group2 select * from cat_group1;

最終的結果都是一樣的，如下圖：

返回頂部

4、創建表時將查詢結果導入 Hive

在 Hive 中創建表 cat_group3 并直接從表 cat_group2 中獲取數據：

hive (db)> create table if not exists cat_group3 as select * from cat_group2; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20220306112908_50eaa6bf-0723-478e-bb8d-0d5101c23c01 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1646527355398_0003, Tracking URL = http://server:8088/proxy/application_1646527355398_0003/ Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1646527355398_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2022-03-06 11:30:40,422 Stage-1 map = 0%, reduce = 0% 2022-03-06 11:30:59,723 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.04 sec MapReduce Total cumulative CPU time: 1 seconds 40 msec Ended Job = job_1646527355398_0003 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/.hive-staging_hive_2022-03-06_11-29-08_568_2915460888753908036-1/-ext-10002 Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/cat_group3 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.04 sec HDFS Read: 5334 HDFS Write: 1751 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 40 msec OK Time taken: 114.647 seconds hive (db)> select * from cat_group3 limit 10; OK 501 有機食品 502 蔬菜水果 503 肉禽蛋奶 504 深海水產 505 地方特產 506 進口食品 507 營養保健 508 休閑零食 509 酒水茶飲 510 糧油副食 Time taken: 0.335 seconds, Fetched: 10 row(s)

返回頂部

二、Hive 中數據的導出

1、導出到本地文件系統

首先，在 linux 本地新建 /../home/data/hive-data/out 目錄：

將表 cat_group 的數據導出至本地的 out 目錄下：

hive (db)> insert overwrite local directory '/../home/data/hive-data/out' > row format delimited fields terminated by '\t'> select * from cat_group; FAILED: IllegalArgumentException Pathname /../home/data/hive-data/out/.hive-staging_hive_2022-03-06_11-49-38_025_4043006350015711521-1 from hdfs://192.168.64.183:9000/../home/data/hive-data/out/.hive-staging_hive_2022-03-06_11-49-38_025_4043006350015711521-1 is not a valid DFS filename. hive (db)> insert overwrite local directory '/home/data/hive-data/out' > row format delimited fields terminated by '\t'> select * from cat_group; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20220306115007_37878929-f5db-4e80-af09-0c1ca7b7c60d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1646527355398_0004, Tracking URL = http://server:8088/proxy/application_1646527355398_0004/ Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1646527355398_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2022-03-06 11:50:39,500 Stage-1 map = 0%, reduce = 0% 2022-03-06 11:50:44,601 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec MapReduce Total cumulative CPU time: 1 seconds 30 msec Ended Job = job_1646527355398_0004 Moving data to local directory /home/data/hive-data/out MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 5671 HDFS Write: 1680 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 30 msec OK Time taken: 37.904 seconds

導出完成后，查看本地目錄下文件的前10行內容：

[root@server out]# cat ./000000_0 |head -n 10 501 有機食品 502 蔬菜水果 503 肉禽蛋奶 504 深海水產 505 地方特產 506 進口食品 507 營養保健 508 休閑零食 509 酒水茶飲 510 糧油副食

返回頂部

2、導出到 Hdfs

在 hdfs 上創建 data/hive/out 目錄：

[root@server out]# hdfs dfs -mkdir /data/hive/out [root@server out]# hdfs dfs -ls /data/hive Found 1 items drwxr-xr-x - root supergroup 0 2022-03-06 15:57 /data/hive/out

將 Hive 中cat_group 表中的數據導出到 hdfs 的 out 目錄下：

hive> insert overwrite directory '/data/hive/out' > row format delimited fields terminated by '\t'> select group_id,group_name from cat_group; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20220306164548_6780ee23-d932-40fb-b7e7-55afe932bb33 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1646556082284_0001, Tracking URL = http://server:8088/proxy/application_1646556082284_0001/ Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1646556082284_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2022-03-06 16:46:54,121 Stage-1 map = 0%, reduce = 0% 2022-03-06 16:47:00,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec MapReduce Total cumulative CPU time: 1 seconds 30 msec Ended Job = job_1646556082284_0001 Stage-3 is selected by condition resolver. Stage-2 is filtered out by condition resolver. Stage-4 is filtered out by condition resolver. Moving data to directory hdfs://192.168.64.183:9000/data/hive/out/.hive-staging_hive_2022-03-06_16-45-48_785_6015608084561284378-1/-ext-10000 Moving data to directory /data/hive/out MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 5561 HDFS Write: 1680 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 30 msec OK Time taken: 73.787 seconds

導出完成后，查看 hdfs 的文件：

[root@server ~]# hdfs dfs -ls /data/hive/out Found 2 items drwxr-xr-x - root supergroup 0 2022-03-06 16:03 /data/hive/out/.hive-staging_hive_2022-03-06_16-03-19_927_4455543203588730803-1 -rwxr-xr-x 3 root supergroup 1680 2022-03-06 16:46 /data/hive/out/000000_0[root@server ~]# hdfs dfs -cat /data/hive/out/000000_0 |head -n 10 501 有機食品 502 蔬菜水果 503 肉禽蛋奶 504 深海水產 505 地方特產 506 進口食品 507 營養保健 508 休閑零食 509 酒水茶飲 510 糧油副食

返回頂部

3、導出到 Hive表

將 Hive 中的 cat_group 表的數導入到 cat_group4中（兩表字段及字符類型形同）。

首先，在Hive 中創建 cat_group4 表：

hive (db)> create table if not exists cat_group4(group_id string ,group_name string)> row format delimited fields terminated by '\t'> stored as textfile; OK Time taken: 0.156 secondshive (db)> show tables; OK cat cat3 cat_group cat_group1 cat_group2 cat_group3 cat_group4 Time taken: 0.016 seconds, Fetched: 3 row(s)

然后，將 cat_group 中的數據導出到 cat_group4 中：

hive> insert into table cat_group4 select * from cat_group; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20220306165034_e2a833b7-d761-4390-a417-4712380b338a Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1646556082284_0002, Tracking URL = http://server:8088/proxy/application_1646556082284_0002/ Kill Command = /usr/local/src/hadoop/bin/hadoop job -kill job_1646556082284_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2022-03-06 16:51:28,079 Stage-1 map = 0%, reduce = 0% 2022-03-06 16:51:38,227 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.09 sec MapReduce Total cumulative CPU time: 1 seconds 90 msec Ended Job = job_1646556082284_0002 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/cat_group4/.hive-staging_hive_2022-03-06_16-50-34_649_124292144843961059-1/-ext-10000 Loading data to table db.cat_group4 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.09 sec HDFS Read: 6108 HDFS Write: 1751 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 90 msec OK Time taken: 65.267 secondshive> select * from cat_group4 limit 10; OK 501 有機食品 502 蔬菜水果 503 肉禽蛋奶 504 深海水產 505 地方特產 506 進口食品 507 營養保健 508 休閑零食 509 酒水茶飲 510 糧油副食 Time taken: 0.132 seconds, Fetched: 10 row(s)

返回頂部

總結

以上是生活随笔為你收集整理的【Hive】Hive表数据的导入导出的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

数据
Hive

上一篇： java编写某计算器控制台程序_用jav
下一篇：公钥，私钥和数字签名

编程问答

【Hive】Hive表数据的导入导出

文章目錄

一、Hive 中數據的導入

1、本地文件系統 導入 Hive 表

2、Hdfs 導入Hive

3、查詢結果 導入 Hive

4、創建表時將查詢結果 導入 Hive

二、Hive 中數據的導出

1、導出到 本地文件系統

2、導出到 Hdfs

3、導出到 Hive表

總結

1、本地文件系統導入 Hive 表

3、查詢結果導入 Hive

4、創建表時將查詢結果導入 Hive

1、導出到本地文件系統