當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

2021年大数据Hive（三）：手把手教你如何吃透Hive数据库和表操作（学会秒变数仓大佬）

發布時間：2023/11/28 生活经验 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 2021年大数据Hive（三）：手把手教你如何吃透Hive数据库和表操作（学会秒变数仓大佬）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

全網最詳細的Hive文章系列，強烈建議收藏加關注！

后面更新文章都會列出歷史文章目錄，幫助大家回顧知識重點。

系列歷史文章

前言

Hive數據庫和表操作

一、數據庫操作

1、創建數據庫

2、創建數據庫并指定hdfs存儲位置

3、查看數據庫詳細信息

4、刪除數據庫

二、數據庫表操作

1、創建數據庫表語法

2、Hive建表時候的字段類型

3、內部表操作

4、外部表操作

5、復雜類型操作

6、分區表

7、分桶表

8、修改表

9、hive表中加載數據

10、hive表中的數據導出

系列歷史文章

2021年大數據Hive（十二）：Hive綜合案例！！！

2021年大數據Hive（十一）：Hive調優

2021年大數據Hive（十）：Hive的數據存儲格式

2021年大數據Hive（九）：Hive的數據壓縮

2021年大數據Hive（八）：Hive自定義函數

2021年大數據Hive（七）：Hive的開窗函數

2021年大數據Hive（六）：Hive的表生成函數

2021年大數據Hive（五）：Hive的內置函數（數學、字符串、日期、條件、轉換、行轉列）

2021年大數據Hive（四）：Hive查詢語法

2021年大數據Hive（三）：手把手教你如何吃透Hive數據庫和表操作（學會秒變數倉大佬）

2021年大數據Hive（二）：Hive的三種安裝模式和MySQL搭配使用

2021年大數據Hive（一）：Hive基本概念

前言

?2021大數據領域優質創作博客，帶你從入門到精通，該博客每天更新，逐漸完善大數據各個知識體系的文章，幫助大家更高效學習。

有對大數據感興趣的可以關注微信公眾號：三幫大數據

Hive數據庫和表操作

一、數據庫操作

1、創建數據庫

create?database if?not?exists?myhive;use??myhive;

說明：hive的表存放位置模式是由hive-site.xml當中的一個屬性指定的

<name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value>

2、創建數據庫并指定hdfs存儲位置

create?database myhive2 location '/myhive2';

3、查看數據庫詳細信息

查看數據庫基本信息

desc??database ?myhive;

4、刪除數據庫

刪除一個空數據庫，如果數據庫下面有數據表，那么就會報錯

drop??database ?myhive;

強制刪除數據庫，包含數據庫下面的表一起刪除

drop??database ?myhive2 ?cascade;?

二、數據庫表操作

1、創建數據庫表語法

CREATE?[EXTERNAL]?TABLE?[IF?NOT?EXISTS]?table_name[(col_name?data_type [COMMENT?col_comment],?...)]?[COMMENT?table_comment]?[PARTITIONED BY?(col_name?data_type [COMMENT?col_comment],?...)]?[CLUSTERED BY?(col_name,?col_name,?...)?[SORTED BY?(col_name?[ASC|DESC],?...)]?INTO?num_buckets BUCKETS]?[ROW?FORMAT?row_format]?[STORED AS?file_format]?[LOCATION hdfs_path]

說明：

1、CREATE TABLE?創建一個指定名字的表。如果相同名字的表已經存在，則拋出異常；用戶可以用 IF NOT EXISTS 選項來忽略這個異常。

2、EXTERNAL?關鍵字可以讓用戶創建一個外部表，在建表的同時指定一個指向實際數據的路徑（LOCATION），Hive 創建內部表時，會將數據移動到數據倉庫指向的路徑；若創建外部表，僅記錄數據所在的路徑，不對數據的位置做任何改變。在刪除表的時候，內部表的元數據和數據會被一起刪除，而外部表只刪除元數據，不刪除數據。

3、LIKE?允許用戶復制現有的表結構，但是不復制數據。

4、ROW FORMAT DELIMITED 可用來指定行分隔符

5、STORED AS??SEQUENCEFILE|TEXTFILE|RCFILE 來指定該表數據的存儲格式，hive中，表的默認存儲格式為TextFile。

6、CLUSTERED BY? 對于每一個表（table）進行分桶(MapReuce中的分區），桶是更為細粒度的數據范圍劃分。Hive也是針對某一列進行桶的組織。Hive采用對列值哈希，然后除以桶的個數求余的方式決定該條記錄存放在哪個桶當中。

7、LOCATION? 指定表在HDFS上的存儲位置。

2、Hive建表時候的字段類型

分類	類型	描述	字面量示例
原始類型	BOOLEAN	true/false	TRUE
	TINYINT	1字節的有符號整數 -128~127	1Y
	SMALLINT	2個字節的有符號整數，-32768~32767	1S
	INT	4個字節的帶符號整數(-2147483648~2147483647)	1
	BIGINT	8字節帶符號整數	1L
	FLOAT	4字節單精度浮點數1.0
	DOUBLE	8字節雙精度浮點數	1.0
	DEICIMAL	任意精度的帶符號小數	1.0
	STRING	字符串，變長	“a”,’b’
	VARCHAR	變長字符串	“a”,’b’
	CHAR	固定長度字符串	“a”,’b’
	BINARY	字節數組	無法表示
	TIMESTAMP	時間戳，毫秒值精度	122327493795
	DATE	日期	‘2016-03-29’
	Time	?時分秒	‘12:35:46’
	DateTime	年月日時分秒
復雜類型	ARRAY	有序的的同類型的集合	???["beijing","shanghai","tianjin","hangzhou"]
	MAP	key-value,key必須為原始類型，value可以任意類型	{"數學":80,"語文":89,"英語":95}
	STRUCT	字段集合,類型可以不同	struct(‘1’,1,1.0)

3、內部表操作

未被external修飾的是內部表（managed table）,內部表又稱管理表,內部表數據存儲的位置由hive.metastore.warehouse.dir參數決定（默認：/user/hive/warehouse），刪除內部表會直接刪除元數據（metadata）及存儲數據，因此內部表不適合和其他工具共享數據。

1、hive建表初體驗

create?database myhive;use?myhive;create?table?stu(id int,name string);insert?into?stu values?(1,"zhangsan");select?*?from?stu;

???????2、創建表并指定字段之間的分隔符

create??table?if?not?exists?stu3(id int?,name string)?row?format?delimited fields terminated by?'\t';

???????3、根據查詢結果創建表

create?table?stu3 as?select?*?from?stu2;

???????4、根據已經存在的表結構創建表

create?table?stu4 like?stu2;

???????5、查詢表的類型

desc?formatted ?stu2;

??????????????6、刪除表

drop?table?stu2;

查看數據庫和HDFS，發現刪除內部表之后，所有的內容全部刪除

4、外部表操作

在創建表的時候可以指定external關鍵字創建外部表,外部表對應的文件存儲在location指定的hdfs目錄下,向該目錄添加新文件的同時，該表也會讀取到該文件(當然文件格式必須跟表定義的一致)。

外部表因為是指定其他的hdfs路徑的數據加載到表當中來，所以hive表會認為自己不完全獨占這份數據，所以刪除hive外部表的時候，數據仍然存放在hdfs當中，不會刪掉。

1、數據裝載載命令Load

Load命令用于將外部數據加載到Hive表中

語法:

load data?[local]?inpath '/export/data/datas/student.txt'?[overwrite] |?into?table?student [partition?(partcol1=val1,…)];

參數:

load data:表示加載數據
local:表示從本地加載數據到hive表；否則從HDFS加載數據到hive表
inpath:表示加載數據的路徑
overwrite:表示覆蓋表中已有數據，否則表示追加
into table:表示加載到哪張表
student:表示具體的表
partition:表示上傳到指定分區

???????2、操作案例

分別創建老師與學生表外部表，并向表中加載數據

源數據如下:

student.txt

01 趙雷 1990-01-01 男

02 錢電 1990-12-21 男

03 孫風 1990-05-20 男

04 李云 1990-08-06 男

05 周梅 1991-12-01 女

06 吳蘭 1992-03-01 女

07 鄭竹 1989-07-01 女

08 王菊 1990-01-20 女

teacher.txt???????

01 張三

02 李四

03 王五

創建老師表：

create?external?table?teacher (tid string,tname string)?row?format?delimited fields terminated by?'\t';

創建學生表：

create?external?table?student (sid string,sname string,sbirth string ,?ssex string )?row?format?delimited fields terminated by?'\t';

從本地文件系統向表中加載數據

load data?local?inpath '/export/data/hivedatas/student.txt'?into?table?student;

加載數據并覆蓋已有數據

load data?local?inpath '/export/data/hivedatas/student.txt'?overwrite ?into?table?student;

從hdfs文件系統向表中加載數據

其實就是一個移動文件的操作

需要提前將數據上傳到hdfs文件系統，

hadoop fs -mkdir -p /hivedatascd /export/data/hivedatashadoop fs -put teacher.csv?/hivedatas/load data?inpath '/hivedatas/teacher.csv'?into?table?teacher;

注意,如果刪掉teacher表，hdfs的數據仍然存在，并且重新創建表之后，表中就直接存在數據了,因為我們的student表使用的是外部表，drop table之后，表當中的數據依然保留在hdfs上面了

5、復雜類型操作

1、Array類型

Array是數組類型，Array中存放相同類型的數據

源數據: ?

說明:name與locations之間制表符分隔，locations中元素之間逗號分隔

zhangsan ??beijing,shanghai,tianjin,hangzhou

wangwu ?? changchun,chengdu,wuhan,beijin

建表語句

create external?table?hive_array(name string,?work_locations array<string>)row?format?delimited fields terminated by?'\t'collection items terminated by ?',';

導入數據（從本地導入，同樣支持從HDFS導入）

load data?local?inpath '/export/data/hivedatas/work_locations.txt'?overwrite into?table?hive_array;

常用查詢：

-- 查詢所有數據select?*?from?hive_array;-- 查詢loction數組中第一個元素select?name,?work_locations[0]?location from?hive_array;-- 查詢location數組中元素的個數select?name,?size(work_locations)?location from?hive_array;-- 查詢location數組中包含tianjin的信息select?*?from?hive_array where?array_contains(work_locations,'tianjin');?

???????6、分區表

分區不是獨立的表模型,要和內部表或者外部表結合:

??內部分區表

??外部分區表

??????????????1、基本操作

在大數據中，最常用的一種思想就是分治，分區表實際就是對應hdfs文件系統上的的獨立的文件夾，該文件夾下是該分區所有數據文件。

分區可以理解為分類，通過分類把不同類型的數據放到不同的目錄下。

分類的標準就是分區字段，可以一個，也可以多個。

分區表的意義在于優化查詢。查詢時盡量利用分區字段。如果不使用分區字段，就會全部掃描。

在查詢是通過where子句查詢來指定所需的分區。

在hive中，分區就是分文件夾

創建分區表語法

create?table?score(sid string,cid string,?sscore int)?partitioned by?(month?string)?row?format?delimited fields terminated by?'\t';

創建一個表帶多個分區

create?table?score2 (sid string,cid string,?sscore int)?partitioned by?(year?string,month?string,day?string)?row?format?delimited fields terminated by?'\t';

加載數據到分區表中

load data?local?inpath '/export/data/hivedatas/score.csv'?into?table?score partition?(month='202006');

加載數據到一個多分區的表中去

load data?local?inpath '/export/data/hivedatas/score.csv'?into?table?score2 partition(year='2020',month='06',day='01');

多分區聯合查詢使用union ?all來實現

select?*?from?score where?month?=?'202006'?union?all?select?*?from?score where?month?=?'202007';

查看分區

show ?partitions ?score;

添加一個分區

alter?table?score?add?partition(month='202008');

同時添加多個分區

alter?table?score add?partition(month='202009')?partition(month?=?'202010');

注意：添加分區之后就可以在hdfs文件系統當中看到表下面多了一個文件夾

刪除分區

alter?table?score drop?partition(month?=?'202010');

???????7、分桶表

分桶就是將數據劃分到不同的文件，其實就是MapReduce的分區

??????????????1、基本操作

將數據按照指定的字段進行分成多個桶中去，說白了就是將數據按照字段進行劃分，可以將數據按照字段劃分到多個文件當中去

開啟hive的桶表功能(如果執行該命令報錯，表示這個版本的Hive已經自動開啟了分桶功能，則直接進行下一步)

set?hive.enforce.bucketing=true;

設置reduce的個數

set?mapreduce.job.reduces=3;?

創建分桶表

create?table?course (cid string,c_name string,tid string)?clustered by(cid)?into?3?buckets row?format?delimited fields terminated by?'\t';

桶表的數據加載，由于桶表的數據加載通過hdfs ?dfs ?-put文件或者通過load ?data均不好使，只能通過insert ?overwrite

創建普通表，并通過insert ?overwrite的方式將普通表的數據通過查詢的方式加載到桶表當中去

創建普通表：

create?table?course_common (cid string,c_name string,tid string)?row?format?delimited fields terminated by?'\t';

普通表中加載數據

load data?local?inpath '/export/data/hivedatas/course.csv'?into?table?course_common;

通過insert ?overwrite給桶表中加載數據

insert?overwrite table?course select?*?from?course_common cluster?by(cid);

8、修改表

1、表重命名

基本語法：

alter??table??old_table_name ?rename??to??new_table_name;

-- 把表score3修改成score4alter?table?score3 rename?to?score4;

???????2、增加/修改列信息

-- 1:查詢表結構desc?score4;-- 2:添加列alter?table?score4 add?columns (mycol string,?mysco string);-- 3:查詢表結構desc?score4;-- 4:更新列alter?table?score4 change column?mysco mysconew int;-- 5:查詢表結構desc?score4;

??????????????3、刪除表

drop?table?score4;

???????4、清空表數據

只能清空管理表，也就是內部表

truncate?table?score4;

9、hive表中加載數據

1、直接向分區表中插入數據

通過insert into方式加載數據

create?table?score3 like?score;insert?into?table?score3 partition(month?='202007')?values?('001','002',100);

通過查詢方式加載數據

create?table?score4 like?score;insert?overwrite table?score4 partition(month?=?'202006')?select?sid,cid,sscore from?score;

???????2、通過查詢插入數據

通過load方式加載數據

create?table?score5 like?score;load data?local?inpath '/export/data/hivedatas/score.csv'?overwrite into?table?score5 partition(month='202006');

???????多插入模式

常用于實際生產環境當中，將一張表拆開成兩部分或者多部分

給score表加載數據

load data?local?inpath '/export/data/hivedatas/score.csv'?overwrite into?table?score partition(month='202006');

創建第一部分表：

create?table?score_first(?sid string,cid ?string)?partitioned by?(month?string)?row?format?delimited fields terminated by?'\t'?;

創建第二部分表：

create?table?score_second(cid string,sscore int)?partitioned by?(month?string)?row?format?delimited fields terminated by?'\t';

分別給第一部分與第二部分表加載數據

from?score insert?overwrite table?score_first partition(month='202006')?select?sid,cid insert?overwrite table?score_second partition(month?=?'202006')??select?cid,sscore;

???????查詢語句中創建表并加載數據（as?select）

將查詢的結果保存到一張表當中去

create?table?score5 as?select?*?from?score;

???????創建表時通過location指定加載數據路徑

1、創建表，并指定在hdfs上的位置

create?external?table?score6 (sid string,cid string,sscore int)?row?format?delimited fields terminated by?'\t'?location '/myscore6';

2、上傳數據到hdfs上

hadoop fs -mkdir -p /myscore6hadoop fs -put score.csv/myscore6;

3、查詢數據

select?*?from?score6;

???????10、hive表中的數據導出

將hive表中的數據導出到其他任意目錄，例如linux本地磁盤，例如hdfs，例如mysql等等

??????????????1、insert導出

1）將查詢的結果導出到本地

insert?overwrite local?directory '/export/data/exporthive'?select?*?from?score;

2）將查詢的結果格式化導出到本地

insert?overwrite local?directory '/export/data/exporthive'?row?format?delimited fields terminated by?'\t'?collection items terminated by?'#'?select?*?from?student;

3）將查詢的結果導出到HDFS上(沒有local)

insert?overwrite directory '/exporthive'?row?format?delimited fields terminated by?'\t'??select?*?from?score;

??????????????2、hive shell 命令導出

基本語法：（hive -f/-e 執行語句或者腳本 > file）

bin/hive -e "select * from myhive.score;"?>?/export/data/exporthive/score.txt

??????????????3、export導出到HDFS上

export table?score to?'/export/exporthive/score';

???????4、sqoop導出

由于篇幅有限，在項目實戰的系列文章詳細介紹

📢博客主頁：https://lansonli.blog.csdn.net
📢歡迎點贊 👍 收藏 ?留言 📝 如有錯誤敬請指正！
📢本文由 Lansonli 原創，首發于 CSDN博客🙉
📢大數據系列文章會每天更新，停下休息的時候不要忘了別人還在奔跑，希望大家抓緊時間學習，全力奔赴更美好的生活?

總結

以上是生活随笔為你收集整理的2021年大数据Hive（三）：手把手教你如何吃透Hive数据库和表操作（学会秒变数仓大佬）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 2021年大数据Hive（二）：Hive
下一篇： 2021年大数据Hive（四）：Hive

生活经验

2021年大数据Hive（三）：手把手教你如何吃透Hive数据库和表操作（学会秒变数仓大佬）

系列歷史文章

前言

Hive數據庫和表操作

一、數據庫操作

1、創建數據庫

2、創建數據庫并指定hdfs存儲位置

3、查看數據庫詳細信息

4、刪除數據庫

二、數據庫表操作

1、創建數據庫表語法

2、Hive建表時候的字段類型

3、內部表操作

4、外部表操作

5、復雜類型操作

???????6、分區表

???????7、分桶表

8、修改表

9、hive表中加載數據

???????10、hive表中的數據導出

總結

一、數據庫操作

3、查看數據庫詳細信息

4、刪除數據庫

二、數據庫表操作

1、創建數據庫表語法

3、內部表操作

4、外部表操作

5、復雜類型操作

???????7、分桶表

???????10、hive表中的數據導出