當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

HIVE的安装配置、mysql的安装、hive创建表、创建分区、修改表等内容、hive beeline使用、HIVE的四种数据导入方式、使用Java代码执行hive的sql命令

發布時間：2024/9/27 数据库 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 HIVE的安装配置、mysql的安装、hive创建表、创建分区、修改表等内容、hive beeline使用、HIVE的四种数据导入方式、使用Java代码执行hive的sql命令小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.上傳tar包
這里我上傳的是apache-hive-1.2.1-bin.tar.gz

2.解壓

mkdir -p?/home/tuzq/software/hive/

tar -zxvf apache-hive-1.2.1-bin.tar.gz ?-C /home/tuzq/software/hive/

3.安裝mysql數據庫（切換到root用戶）（裝在哪里沒有限制，只有能聯通hadoop集群的節點）

mysql安裝可以參考：http://blog.csdn.net/tototuzuoquan/article/details/52711808

mysql安裝僅供參考，不同版本mysql有各自的安裝流程

rpm -qa | grep mysql
rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps
rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm?
rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm?
修改mysql的密碼
/usr/bin/mysql_secure_installation
（注意：刪除匿名用戶，允許用戶遠程連接）
登陸mysql
mysql -u root -p

4.配置hive

（a）配置HIVE_HOME環境變量

vim /etc/profile

export JAVA_HOME=/usr/local/jdk1.8.0_73
export HADOOP_HOME=/home/tuzq/software/hadoop-2.8.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_HOME=/home/tuzq/software/hive/apache-hive-1.2.1-bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin

source?/etc/profile

將hadoop集群中的其它的環境變量也配置成這種，如我的hadoop集群是hadoop1,hadoop2,hadoop3,hadoop4,hadoop5。這些我都配置成了上面的同樣的環境變量

[root@hadoop1 conf]# cd $HIVE_HOME/conf

[root@hadoop1 conf]# mv hive-env.sh.template hive-env.sh

[root@hadoop1 conf]#?vi $HIVE_HOME/conf/hive-env.sh

配置其中的$hadoop_home

添加：

export HIVE_CONF_DIR=/home/bigdata/installed/hive-2.3.2/conf

（b）配置元數據庫信息 ??

在$HIVE_HOME/conf文件加下，創建hive-site.xml文件，文件內容如下：

vi ?hive-site.xml?

添加如下內容：

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop10:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf-8</value><description>JDBC connect string for a JDBC metastore</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description></property><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value><description>username to use against metastore database</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>root</value><description>password to use against metastore database</description></property> </configuration>

5.安裝hive和mysq完成后，將mysql的連接jar包拷貝到$HIVE_HOME/lib目錄下

如果出現沒有權限的問題，在mysql授權(在安裝mysql的機器上執行)
mysql -uroot -p
#(執行下面的語句 ?*.*:所有庫下的所有表 ? %：任何IP地址或主機都可以連接)
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'root' WITH GRANT OPTION;
FLUSH PRIVILEGES;

6. 如果hadoop使用的是2.6.4版本的，會存在Jline包版本不一致的問題，需要拷貝hive的lib目錄中jline.2.12.jar的jar包替換掉hadoop中的

/home/hadoop/app/hadoop-2.6.4/share/hadoop/yarn/lib/jline-0.9.94.jar

如果是hadoop-2.8.0版本的，發現在/home/tuzq/software/hadoop-2.8.0/share/hadoop/yarn/lib下沒有jline-2.12.jar

下面的命令是查看HIVE中的jline的版本號的方式：

[root@hadoop1 lib]# cd $HIVE_HOME/lib
[root@hadoop1 lib]# ls jline-2.12.jar?
jline-2.12.jar
[root@hadoop1 lib]#

將創建好的hive遠程拷貝到hadoop2,hadoop3,hadoop4,hadoop5服務器上的相同位置

scp -r /home/tuzq/software/hive* root@hadoop2:/home/tuzq/software/

scp -r /home/tuzq/software/hive* root@hadoop3:/home/tuzq/software/

scp -r /home/tuzq/software/hive* root@hadoop4:/home/tuzq/software/

scp -r /home/tuzq/software/hive* root@hadoop5:/home/tuzq/software/

使用schematool初始化hive的表

cd?/home/bigdata/installed/hive-2.3.2/bin

然后執行：

[bigdata@bigdata1 bin]$ ./schematool -dbType mysql -initSchema

為了解決hive出現的亂碼，解決辦法是：

因為我們知道?metastore?支持數據庫級別，表級別的字符集是?latin1，那么我們只需要把相應注釋的地方的字符集由?latin1?改成?utf-8，就可以了。用到注釋的就三個地方，表、分區、視圖。如下修改分為兩個步驟：

執行下面的操作：

(1)、進入數據庫?Metastore?中執行以下?5?條?SQL?語句?
?

?①修改表字段注解和表注解
alter table COLUMNS_V2 modify column COMMENT varchar(256) character?set utf8
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000)?character set utf8
②?修改分區字段注解：
alter table PARTITION_PARAMS modify column PARAM_VALUE?varchar(4000) character set utf8 ;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000)?character set utf8;
③修改索引注解：
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000)?character set utf8;

啟動hive
bin/hive

執行完成之后，到mysql中進行查看，發現現象如下：

另外：通過Hive beeline也可以訪問hive:

---Beeline要與HiveServer2配合使用，支持嵌入式模式和遠程模式

--啟動HiverServer2 ,./bin/hiveserver2

命令模式：

hive?--service hiveserver2 --hiveconf?hive.server2.thrift.port=10001

最后面的port可以更改，hiveserver2默認的端口號是10000。beeline的退出方式：!quit

[root@hadoop1 apache-hive-1.2.1-bin]# bin/beeline?
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://hadoop1:10000
Connecting to jdbc:hive2://hadoop1:10000
Enter username for jdbc:hive2://hadoop1:10000:?
Enter password for jdbc:hive2://hadoop1:10000:?
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop1:10000> show databases;
+----------------+--+
| database_name ?|
+----------------+--+
| default ? ? ? ?|
| mydb ? ? ? ? ? |
| userdb ? ? ? ? |
| userdb2 ? ? ? ?|
| userdb3 ? ? ? ?|
+----------------+--+
5 rows selected (0.709 seconds)
0: jdbc:hive2://hadoop1:10000>

啟動hiveserver2 和?hivemetastore

[root@bigdata1 hive-2.3.2]# cd $HIVE_HOME/bin

hivemetastore ? 啟動方式： nohup hive --service metastore &

hiveserver2 ? ? ? ?啟動方式：? nohup?hive --service hiveserver2 &
----------------------------------------------------------------------------------------------------

查看有哪些表：

[root@hadoop1 apache-hive-1.2.1-bin]# bin/hive

Logging initialized using configuration in jar:file:/home/tuzq/software/hive/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.98 seconds, Fetched: 1 row(s)
hive> create database db1; #創建一個數據庫
OK
Time taken: 0.239 seconds
hive> show databases; #顯示所有的數據庫
OK
db1
default
Time taken: 0.015 seconds, Fetched: 2 row(s)
hive>

然后進入hdfs上進行查看http://hadoop1:50070/ ：

6.建表(默認是內部表)

use db1;?

hive> create table trade_detail(id bigint,account string,income double,expense double,time string)

row format delimited fields terminated by '\t';

進入hdfs進行查看：

select nation, avg(size) from beauties group by nation order by avg(size);

注意：如果在此過程中獲取的結果是NULL的，說明創建表的時候需要加上：lines terminated by '\n'?

如果想通過drop table if exists table_name刪除表時刪除不了，請換$HIVE_HOME/lib中的mysql-connector-java。比如我使用的是：mysql-5.7.15-linux-glibc2.5-x86_64.tar.gz,開始的時候使用的是mysql-connector-java-5.1.7.jar，最后換成mysql-connector-java-5.1.38.jar，發現就可以drop表了。

查看database，刪除數據庫

以下是使用CASCADE查詢刪除數據庫。這意味著要全部刪除相應的表在刪除數據庫之前：

hive> show databases;
OK
default
mydb
userdb
userdb2
userdb3
Time taken: 0.962 seconds, Fetched: 5 row(s)
hive> drop database IF EXISTS ?userdb3 CASCADE;
OK
Time taken: 0.203 seconds

hive> show databases;
OK
default
mydb
userdb
userdb2
Time taken: 0.014 seconds, Fetched: 4 row(s)
hive>

修改表

Alter Table語句，它是在Hive中用來修改表的

語法：

ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])(col_spec[, col_spec ...]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])

hive> show tables;
OK
testhivedrivertable
Time taken: 0.029 seconds, Fetched: 1 row(s)
hive> ALTER TABLE testhivedrivertable RENAME To testHive;
OK
Time taken: 0.345 seconds
hive> show tables;
OK
testhive
Time taken: 0.031 seconds, Fetched: 1 row(s)

修改列中列的類型：

hive> desc testhive;
OK
key ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
value ? ? ? ? ? ? ?string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Time taken: 0.161 seconds, Fetched: 2 row(s)

hive> ALTER TABLE testhive CHANGE value? Double;
OK
Time taken: 0.251 seconds
hive> desc testhive;
OK
key ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
value ? ? ? ? ? ? ? double ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Time taken: 0.126 seconds, Fetched: 2 row(s)
hive>

為表添加一列：

hive> ALTER TABLE testhive ADD COLUMNS (dept STRING COMMENT 'Departname name');
OK
Time taken: 0.219 seconds
hive> desc testhive;
OK
key ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
value ? ? ? ? ? ? ? double ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
dept ? ? ? ? ? ? ? ? string ? ? ? ? ? ? ? Departname name ? ??
Time taken: 0.09 seconds, Fetched: 3 row(s)
hive>

刪除表

刪除表的語法是：

hive>DROP TABLE IF EXISTS testhive;

創建分區

Hive組織表到分區。它是將一個表到基于分區列，如日期，城市和部門的值相關方式。使用分區，很容易對數據進行部分查詢。很容易對數據進行部分查詢。

表或分區是細分成桶，以提供額外的結構，可以使用更搞笑的查詢的數據。桶的工作是基于表的一些列的散列函數值。

添加分區，語法是：

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
:(p_column = p_col_value, p_column = p_col_value, ...)

先做準備工作，創建表：

CREATE TABLE IF NOT EXISTS employee (eid int, name String,
? ? destination String)
? ? partitioned by (salary String)
? ? ROW FORMAT DELIMITED
? ? FIELDS TERMINATED BY '\t'
? ? LINES TERMINATED BY '\n'
? ? STORED AS TEXTFILE;

hive>desc employee;

經過上面步驟，表已經添加了一個分區

導入數據：

[root@hadoop1 hivedata]# cat /home/tuzq/software/hivedata/sample.txt?
1201 pal 45000 Technical manager
1202 Manisha 45000 Proof reader
[root@hadoop1 hivedata]#

將上面的數據導入到分區：

LOAD DATA LOCAL INPATH '/home/tuzq/software/hivedata/sample.txt' INTO TABLE employee PARTITION(salary = '45000');

注意上滿的紅字，表示將數據到如45000這個分區中。

在hdfs上的效果如下：

http://hadoop1:50070/explorer.html#/user/hive/warehouse/userdb.db/employee

下面再次給表添加另外一個分區值：

ALTER TABLE employee ADD PARTITION (salary ='40000') location '/40000/part40000';

添加location之后，它在HDFS上的位置將會改變，將會到/40000/part40000中。效果圖如下：

http://hadoop1:50070/explorer.html#/40000/part40000

創建2個分區的方式：

雙分區建表語句：

create table table_name (id int, content string) partitioned by (dt string, hour string);

雙分區表，按天和小時分區，在表結構中新增加了dt和hour兩列。
先以dt為文件夾，再以hour子文件夾區分

查看分區語句：

hive> show partitions employee;
OK
salary=40000
salary=45000
Time taken: 0.088 seconds, Fetched: 2 row(s)
hive>

再如：

建分區表
hive> create table td_part(id bigint,account string,income double,expenses double,time string)?

? ? ? ? ? ? ? ? ? ? partitioned by (logdate string)row format delimited fields terminated by '\t';
? ? ? ?OK
? ? ? ?Time taken: 0.114 seconds
? ? ? ?hive>?show tables;
? ? ? ?OK
? ? ? ?td_part
? ? ? ?trade_detail
? ? ? ?Time taken: 0.021 seconds, Fetched: 2 row(s)

? ? ? ?hive>

建外部表
create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';

7.創建分區表
普通表和分區表區別：有大量數據增加的需要建分區表
hive> create table book(id bigint,name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';
OK
Time taken: 0.108 seconds
hive> show tables;
OK
book
td_part-
trade_detail
Time taken: 0.02 seconds, Fetched: 3 row(s)
hive>

分區表加載數據
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

load data local inpath '/root/data.am' into table beauty partition (nation="USA");

?

創建視圖和索引

視圖在Hive的用法和SQL視圖用法相同。它是一個標準的RDBMS概念。我們可以在視圖上執行DML操作。

創建視圖的語法如下：

CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT table_comment]
AS SELECT ...

hive> desc employee;
OK
eid ? ? ? ? ? ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
name ? ? ? ? ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
destination ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
salary ? ? ? ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
?
# Partition Information ?
# col_name ? ? ? ? ? ? data_type ? ? ? ? ? comment ? ? ? ? ? ??
?
salary ? ? ? ? ? ? ? string ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Time taken: 0.08 seconds, Fetched: 9 row(s)
hive> create VIEW emp_45000 AS
? ? > SELECT * FROM employee
? ? > WHERE salary = 45000;

刪除一個視圖的方式：

hive > DROP VIEW emp_45000;

創建索引：

創建索引的語法如下：

CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
? ?[ ROW FORMAT ...] STORED AS ...
? ?| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]

HIVE的四種數據導入方式：

HIVE的幾種常見的數據導入方式，這里介紹四種：

（1）、從本地文件系統中導入數據到Hive表；

（2）、從HDFS上導入數據到Hive表

（3）、從別的表中查詢出相應的數據并導入到Hive表中。

（4）、在創建表的時候通過從別的表中查詢出相應的記錄并插入到所創建的表中。

一、從本地文件系統中導入數據到Hive表

先在Hive里面創建好表，如下：

hive> create table wyp(id int,name string,age int,tel string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' lines terminated by '\n' STORED AS TEXTFILE;

注意上面的：ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' ? ? ?注意這個分割的字符，若是設置的不好，最后能夠插入數據庫，但是select出來的結果是NULL.

這個表很簡單，只有四個字段。本地文件系統里有/home/tuzq/software/hive/apache-hive-1.2.1-bin/wyp.txt 文件，內容如下：
[root@hadoop1 apache-hive-1.2.1-bin]# pwd
/home/tuzq/software/hive/apache-hive-1.2.1-bin
[root@hadoop1 apache-hive-1.2.1-bin]# cat wyp.txt?
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
[root@hadoop1 apache-hive-1.2.1-bin]#

wyp.txt文件中的數據列之間使用空格分割的，可以通過下面的語句將這個文件里面的數據導入到wyp表里面，操作如下：

hive> load data local inpath '/home/tuzq/software/hive/apache-hive-1.2.1-bin/wyp.txt' into table wyp;
Loading data to table default.wyp
Table default.wyp stats: [numFiles=1, totalSize=67]
OK
Time taken: 0.35 seconds
hive> select * from wyp;
OK
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
Time taken: 0.086 seconds, Fetched: 3 row(s)
hive>

這樣就將wyp.txt里面的內容導入到wyp表里面去了，可以到wyp表的數據目錄下查看，http://hadoop1:50070/explorer.html#/user/hive/warehouse/db1.db：

二、HDFS上導入數據到hive表

? ? 從本地文件系統中將數據導入到Hive表的過程中，其實是先將數據臨時復制到HDFS的一個目錄下（典型的情況是復制到上傳用戶的HDFS的目錄下，比如/根目錄下），然后在將數據從那個臨時目錄下移動(注意，這里說的是移動，不是復制)到對應的數據目錄里面。既然如此，那么Hive肯定支持將數據直接從HDFS上的一個目錄移動到相應Hive表的數據目錄下，假設有這個文件/add.txt,具體的操作如下：

[root@hadoop1 apache-hive-1.2.1-bin]# ls

add.txt ?bin ?book.txt ?conf ?examples ?hcatalog ?lib ?LICENSE ?NOTICE ?README.txt ?RELEASE_NOTES.txt ?scripts ?wyp.txt

[root@hadoop1 apache-hive-1.2.1-bin]# vim add.txt

?[root@hadoop1 apache-hive-1.2.1-bin]# hdfs dfs -put add.txt /

[root@hadoop1 apache-hive-1.2.1-bin]# hdfs dfs -ls /

Found 4 items-rw-r--r-- ? 3 root supergroup ? ? ? ? 67 2017-06-11 11:34 /add.txt

-rw-r--r-- ? 3 root supergroup ? ? ? 3719 2017-06-10 12:11 /kms.sh

drwx-wx-wx ? - root supergroup ? ? ? ? ?0 2017-06-10 22:06 /tmp

drwxr-xr-x ? - root supergroup ? ? ? ? ?0 2017-06-10 22:27 /user

[root@hadoop1 apache-hive-1.2.1-bin]# hdfs dfs -cat /add.txt

4 wyp?

25 131888888888885 test?30 138888888888886 zs?34 899314121

[root@hadoop1 apache-hive-1.2.1-bin]#

上面是需要插入數據的內容，這個文件時存放在HDFS上/add.txt里面的（和一中提到的不同，一中提到的文件是存放在本地文件系統上，并且在load數據的時候加上了關鍵字local），我們可以同通過下面的命令將這個文件里面的內容導入到Hive表中，具體操作如下：

hive> select * from wyp;

OK1 wyp

25 131888888888882 test 30 138888888888883 zs 34 899314121

Time taken: 0.086 seconds, Fetched: 3 row(s)

hive> load data inpath '/add.txt' into table wyp;

Loading data to table default.wypTable default.wyp stats: [numFiles=2, totalSize=134]OKTime taken: 0.283 seconds

hive> select * from wyp;

OK4 wyp

25 131888888888885 test

30 138888888888886 zs

34 8993141211 wyp

25 131888888888882 test

30 138888888888883 zs

34 899314121

Time taken: 0.076 seconds, Fetched: 6 row(s)

hive>?

從上面的執行結果我們可以看到，數據的確導入到wyp表中了！請注意 load data inpath '/add.txt' into table wyp; 里面沒有local這個單詞，這個是和一中的區別。? ??

三、從別的表中查詢出相應的數據并導入到Hive表中

假設Hive中有test表，其建表語句如下所示：

hive> create table test(

? ? > id int, name string

? ? > ,tel string)

? ? > partitioned by

? ? > (age int)

? ? > ROW FORMAT DELIMITED

? ? > FIELDS TERMINATED BY '\t'

? ? > STORED AS TEXTFILE;

Time taken: 0.261 seconds

復制代碼

大體和wyp表的建表語句類似，只不過test表里面用age作為了分區字段。對于分區，這里在做解釋一下：

分區：在Hive中，表的每一個分區對應表下的相應目錄，所有分區的數據都是存儲在對應的目錄中。比如wyp表有dt和city兩個分區，則對應dt=20131218,city=BJ對應表的目錄為/user/hive/warehouse/dt=20131218/city=BJ，所有屬于這個分區的數據都存放在這個目錄中。

下面語句就是將wyp表中的查詢結果并插入到test表中：

hive> insert into table test

? ? > partition (age='25')

? ? > select id, name, tel

? ? > from wyp;

#####################################################################

? ?? ?? ???這里輸出了一堆Mapreduce任務信息，這里省略

#####################################################################

Total MapReduce CPU Time Spent: 1 seconds 310 msec

Time taken: 19.125 seconds

hive> select * from test;

5? ?? ? wyp1? ? 131212121212? ? 25

6? ?? ? wyp2? ? 134535353535? ? 25

7? ?? ? wyp3? ? 132453535353? ? 25

8? ?? ? wyp4? ? 154243434355? ? 25

1? ?? ? wyp? ???13188888888888??25

2? ?? ? test? ? 13888888888888??25

3? ?? ? zs? ?? ?899314121? ?? ? 25

Time taken: 0.126 seconds, Fetched: 7 row(s)

復制代碼

這里做一下說明：我們知道我們傳統數據塊的形式insert into table values（字段1，字段2），這種形式hive是不支持的。

通過上面的輸出，我們可以看到從wyp表中查詢出來的東西已經成功插入到test表中去了！如果目標表（test）中不存在分區字段，可以去掉partition (age=’25′)語句。當然，我們也可以在select語句里面通過使用分區值來動態指明分區：

hive> set hive.exec.dynamic.partition.mode=nonstrict;

hive> insert into table test

? ? > partition (age)

? ? > select id, name,

? ? > tel, age

? ? > from wyp;

#####################################################################

? ?? ?? ???這里輸出了一堆Mapreduce任務信息，這里省略

#####################################################################

Total MapReduce CPU Time Spent: 1 seconds 510 msec

Time taken: 17.712 seconds

hive> select * from test;

5? ?? ? wyp1? ? 131212121212? ? 23

6? ?? ? wyp2? ? 134535353535? ? 24

7? ?? ? wyp3? ? 132453535353? ? 25

1? ?? ? wyp? ???13188888888888??25

8? ?? ? wyp4? ? 154243434355? ? 26

2? ?? ? test? ? 13888888888888??30

3? ?? ? zs? ?? ?899314121? ?? ? 34

Time taken: 0.399 seconds, Fetched: 7 row(s)

復制代碼

這種方法叫做動態分區插入，但是Hive中默認是關閉的，所以在使用前需要先把hive.exec.dynamic.partition.mode設置為nonstrict。當然，Hive也支持insert overwrite方式來插入數據，從字面我們就可以看出，overwrite是覆蓋的意思，是的，執行完這條語句的時候，相應數據目錄下的數據將會被覆蓋！而insert into則不會，注意兩者之間的區別。例子如下：

hive> insert overwrite table test

? ? > PARTITION (age)

? ? > select id, name, tel, age

? ? > from wyp;

復制代碼

更可喜的是，Hive還支持多表插入，什么意思呢？在Hive中，我們可以把insert語句倒過來，把from放在最前面，它的執行效果和放在后面是一樣的，如下：

hive> show create table test3;

CREATE??TABLE test3(

??id int,

??name string)

Time taken: 0.277 seconds, Fetched: 18 row(s)

hive> from wyp

? ? > insert into table test

? ? > partition(age)

? ? > select id, name, tel, age

? ? > insert into table test3

? ? > select id, name

? ? > where age>25;

hive> select * from test3;

8? ?? ? wyp4

2? ?? ? test

3? ?? ? zs

Time taken: 4.308 seconds, Fetched: 3 row(s)

復制代碼

可以在同一個查詢中使用多個insert子句，這樣的好處是我們只需要掃描一遍源表就可以生成多個不相交的輸出。這個很酷吧！

四、在創建表的時候通過從別的表中查詢出相應的記錄并插入到所創建的表中

在實際情況中，表的輸出結果可能太多，不適于顯示在控制臺上，這時候，將Hive的查詢輸出結果直接存在一個新的表中是非常方便的，我們稱這種情況為CTAS（create table .. as select）如下：

hive> create table test4

? ? > as

? ? > select id, name, tel

? ? > from wyp;

hive> select * from test4;

5? ?? ? wyp1? ? 131212121212

6? ?? ? wyp2? ? 134535353535

7? ?? ? wyp3? ? 132453535353

8? ?? ? wyp4? ? 154243434355

1? ?? ? wyp? ???13188888888888

2? ?? ? test? ? 13888888888888

3? ?? ? zs? ?? ?899314121

Time taken: 0.089 seconds, Fetched: 7 row(s)

復制代碼

數據就插入到test4表中去了，CTAS操作是原子的，因此如果select查詢由于某種原因而失敗，新表是不會創建的！

Java遠程調用hive

? 使用java遠程連接hive,在這個過程中需要先啟動：hiveServer2. ? （注意:org.apache.hive.jdbc.HiveDriver依賴的jar包是：hive-jdbc-1.2.1.jar）

package hive;import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement;public class HiveCreateDb {/** hiverserver 版本使用此驅動 private static String driverName =* "org.apache.hadoop.hive.jdbc.HiveDriver";*//** hiverserver2 版本使用此驅動*/private static String driverName = "org.apache.hive.jdbc.HiveDriver";public static void main(String[] args) throws Exception {Class.forName(driverName);/* hiverserver 版本jdbc url格式,主要體現在jdbc:hive:// */// Connection con =// DriverManager.getConnection("jdbc:hive://hadoop1:10000/default", "",// "");/* hiverserver2 版本jdbc url格式,主要體現在jdbc:hive2:// */Connection con = DriverManager.getConnection("jdbc:hive2://hadoop1:10000/default", "", "");Statement stmt = con.createStatement();// 下面的這一句如果在沒有userdb數據庫的情況下，可以放開。// stmt.executeQuery("CREATE DATABASE userdb");// 參數設置測試// boolean resHivePropertyTest = stmt// .execute("SET tez.runtime.io.sort.mb = 128");boolean resHivePropertyTest = stmt.execute("set hive.execution.engine=tez");System.out.println(resHivePropertyTest);stmt.execute("USE userdb");String tableName = "testHiveDriverTable";try {stmt.executeQuery("drop table " + tableName);} catch (Exception e) {e.printStackTrace();}ResultSet res;try {res = stmt.executeQuery("create table " + tableName + " (key int, value string)");} catch (Exception e) {e.printStackTrace();}// show tablesString sql = "show tables '" + tableName + "'";System.out.println("Running: " + sql);res = stmt.executeQuery(sql);if (res.next()) {System.out.println(res.getString(1));}// //describe tablesql = "describe " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(res.getString(1) + "\t" + res.getString(2));}// load data into table// NOTE: filepath has to be local to the hive server// NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per// lineString filepath = "/tmp/a.txt";sql = "load data local inpath '" + filepath + "' into table " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);// select * querysql = "select * from " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));}// regular hive querysql = "select count(1) from " + tableName;System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(res.getString(1));}stmt.close();con.close();}}

總結

以上是生活随笔為你收集整理的HIVE的安装配置、mysql的安装、hive创建表、创建分区、修改表等内容、hive beeline使用、HIVE的四种数据导入方式、使用Java代码执行hive的sql命令的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： hdfs haadmin使用，DataN
下一篇： linux cmake编译源码,linu