日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop详细配置

發(fā)布時(shí)間:2023/12/15 编程问答 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hadoop详细配置 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

目錄

第1章?概要說明4

1.1?Hadoop是什么?4

1.2?為什么選擇CDH版本?4

1.3?集群配置環(huán)境4

1.4?網(wǎng)絡(luò)結(jié)構(gòu)圖5

第2章?安裝hadoop環(huán)境6

2.1?準(zhǔn)備安裝包6

2.2?默認(rèn)用戶組root:root6

2.3?卸載自帶的jdk6

2.4?安裝和配置jdk環(huán)境6

2.5?配置/etc/hosts6

2.6?配置ssh無密碼登陸7

2.7?處理防火墻7

2.8?將hadoop-2.0.0-cdh4.2.0.zip上傳到/opt,并解壓縮9

2.9?編輯core-site.xml文件9

2.10?編輯hdfs-site.xml文件9

2.11?編輯slaves文件10

2.12?編輯mapred-site.xml文件10

2.13?編輯yarn-site.xml文件11

2.14?編輯.bashrc文件13

2.15?將master01機(jī)上的/opt/hadoop拷貝到其他機(jī)器上14

2.16?第一次啟動(dòng)hadoop需要先格式化NameNode14

2.17?在master01機(jī)上啟動(dòng)hdfs:14

2.18?在master01機(jī)上啟動(dòng)mapreduce,historyserver14

2.19?查看master01機(jī)的MapReduce15

2.20?查看slave01,slave02的節(jié)點(diǎn)15

2.21?檢查各臺(tái)機(jī)器的集群進(jìn)程15

2.22?關(guān)閉服務(wù)15

第3章?Zookeeper安裝16

3.1?準(zhǔn)備安裝包16

3.2?解壓16

3.3?修改zoo.cfg文件16

3.4?修改環(huán)境變量17

3.5?創(chuàng)建data文件夾及修改myid文件17

3.6?將文件復(fù)制至其他機(jī)器17

3.7?啟動(dòng)18

3.8?檢查是否成功18

3.9?停止服務(wù)18

3.10?參考文檔18

第4章?Hive的安裝19

4.1?準(zhǔn)備安裝包19

4.2?準(zhǔn)備機(jī)器19

4.3?訪問mysql19

4.4?配置hive-site.xml文件,將meta信息保存在mysql里19

4.5?將mysql-connector-java-5.1.18.tar.gz解壓22

4.6?Mysql的一些操作22

4.7?查看日志記錄22

4.8?Hive導(dǎo)入本地?cái)?shù)據(jù)命令22

第5章?Hive+Thrift+PHP整合23

5.1?準(zhǔn)備安裝包23

5.2?編輯代碼23

5.3?啟動(dòng)hiveserver24

5.4?查看默認(rèn)開啟的10000端口24

5.5?測(cè)試24

5.6?出錯(cuò)提示及解決辦法24

第6章?sqoop安裝使用25

6.1?準(zhǔn)備安裝包25

6.2?前提工作25

6.3?安裝25

6.4?放置mysql驅(qū)動(dòng)包25

6.5?修改configure-sqoop文件25

6.6?將路徑加入PATH25

6.7?使用測(cè)試26

6.8?出錯(cuò)提示及解決辦法27

6.9?參考27

?

第1章?概要說明

1.1?Hadoop是什么?

Hadoop一個(gè)分布式系統(tǒng)基礎(chǔ)架構(gòu),由Apache基金會(huì)開發(fā)。用戶可以在不了解分布式底層細(xì)節(jié)的情況下,開發(fā)分布式程序。充分利用集群的威力高速運(yùn)算和存儲(chǔ)。Hadoop實(shí)現(xiàn)了一個(gè)分布式文件系統(tǒng)(Hadoop?Distributed?File?System),簡(jiǎn)稱HDFS。HDFS有著高容錯(cuò)性的特點(diǎn),并且設(shè)計(jì)用來部署在低廉的(low-cost)硬件上。而且它提供高傳輸率(high?throughput)來訪問應(yīng)用程序的數(shù)據(jù),適合那些有著超大數(shù)據(jù)集(large?data?set)的應(yīng)用程序。HDFS放寬了(relax)POSIX的要求(requirements)這樣可以流的形式訪問(streaming?access)文件系統(tǒng)中的數(shù)據(jù)。

1.2?為什么選擇CDH版本?

?0?1?CDH基于穩(wěn)定版Apache?Hadoop,并應(yīng)用了最新Bug修復(fù)或者Feature的Patch。Cloudera常年堅(jiān)持季度發(fā)行Update版本,年度發(fā)行Release版本,更新速度比Apache官方快,而且在實(shí)際使用過程中CDH表現(xiàn)無比穩(wěn)定,并沒有引入新的問題。

?0?1?Cloudera官方網(wǎng)站上安裝、升級(jí)文檔詳細(xì),省去Google時(shí)間。

?0?1?CDH支持Yum/Apt包,Tar包,RPM包,Cloudera?Manager四種方式安裝

?0?1?獲取最新特性和最新Bug修復(fù);安裝維護(hù)方便,節(jié)省運(yùn)維時(shí)間

1.3?集群配置環(huán)境

[root@master01?~]#?lsb_release?-a

LSBVersion:????:base-4.0-ia32:base-4.0-noarch:core-4.0-ia32:core-4.0-noarch:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-ia32:printing-4.0-noarch

Distributor?ID:?CentOS

Description:????CentOS?release?6.4?(Final)

Release:????????6.4

Codename:???????Final

1.4?網(wǎng)絡(luò)結(jié)構(gòu)圖



?

第2章?安裝hadoop環(huán)境

2.1?準(zhǔn)備安裝包

jdk-7-linux-i586.rpm???[77.2M]

hadoop-2.0.0-cdh4.2.0??[129M]???

此安裝包URL下載:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html


http://archive.cloudera.com/cdh4/cdh/4/


2.2?默認(rèn)用戶組root:root

2.3?卸載自帶的jdk

[root@master01?local]#?rpm?-qa?|?grep?jdk

java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.i686

yum?-y?remove?java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.i686

yum?-y?remove?java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.i686

2.4?安裝和配置jdk環(huán)境

[root@master01?local]#?rpm?-ivh?jdk-7-linux-i586.rpm

Preparing...????????????????###########################################?[100%]

???1:jdk????????????????????###########################################?[100%]

&?注意

下面有設(shè)置JAVA_HOME環(huán)境的清單,寫在~/.bashrc.sh文件里

另外請(qǐng)注意:生產(chǎn)環(huán)境下一般為64位機(jī),請(qǐng)下載相應(yīng)的64位JDK包進(jìn)行安裝

2.5?配置/etc/hosts

vi?/etc/hosts

192.168.2.18???master01

192.168.2.19???master02

192.168.2.163??slave01

192.168.2.38???slave02

192.168.2.212???slave03

&?注意:其他機(jī)器也要修改

rsync??-vzrtopgu???--progress?/etc/hosts?192.168.2.38:/etc/hosts

2.6?配置ssh無密碼登陸

ssh-keygen?-t?rsa

ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave01

ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave02

&?注意

Master01機(jī)本身也要設(shè)置一下哦!

cd?~

cat?id_rsa.pub?>>authorized_keys

2.7?處理防火墻

service?iptables?stop

&?說明

如果不關(guān)閉防火墻,讓datanode通過namenode機(jī)的訪問,請(qǐng)配置slave01,slave02等相關(guān)機(jī)器的iptables表,各臺(tái)機(jī)器都要能互相訪問

vi?/etc/sysconfig/iptables

添加:

-I?INPUT?-s?192.168.2.18?-j?ACCEPT

-I?INPUT?-s?192.168.2.38?-j?ACCEPT

-I?INPUT?-s?192.168.2.87?-j?ACCEPT

開啟master01的8088和50070端口,方便WEB訪問namenode和mapreduce

圖1?

?

圖2?

?

2.8?將hadoop-2.0.0-cdh4.2.0.zip上傳到/opt,并解壓縮

tar?xzvf?hadoop-2.0.0-cdh4.2.0.tar.gz

mv?hadoop-2.0.0-cdh4.2.0?hadoop

cd?hadoop/etc/hadoop/


③ 解壓后進(jìn)入:~/bin/hadoop-0.20.2/conf/,修改配置文件:

修改hadoop-env.sh:

export JAVA_HOME=/root/bin/jdk1.6.0_32 轉(zhuǎn)載注明出處:博客園 石頭兒 http://www.cnblogs.com/shitouer/

hadoop-env.sh里面有這一行,默認(rèn)是被注釋的,只需要把注釋去掉,并且把JAVA_HOME 改成你的java安裝目錄即可


2.9?編輯core-site.xml文件

vi?core-site.xml

<configuration>

<property>

?????<name>fs.defaultFS</name>

????????<value>hdfs://master01</value>

</property>

<property>

????????<name>fs.trash.interval</name>

????????<value>10080</value>

</property>

<property>

????????<name>fs.trash.checkpoint.interval</name>

????????<value>10080</value>

</property>

</configuration>

2.10?編輯hdfs-site.xml文件

vi?hdfs-site.xml

<configuration>

<property>

??????????<name>dfs.replication</name>

??????????<value>1</value>

</property>

<property>

????????<name>hadoop.tmp.dir</name>

????????<value>/opt/data/hadoop-${user.name}</value>

</property>

<property>

????????<name>dfs.namenode.http-address</name>

????????<value>master01:50070</value>

</property>

<property>

????????<name>dfs.secondary.http.address</name>

????????<value>master02:50090</value>

</property>

<property>

????????<name>dfs.webhdfs.enabled</name>

????????<value>true</value>

</property>

</configuration>

2.11?編輯slaves文件

vi?slaves

slave01

slave02

2.12?編輯mapred-site.xml文件

cp?mapred-site.xml.template?mapred-site.xml

vi?mapred-site.xml

<configuration>

<property>

?????????<name>mapreduce.framework.name</name>

?????????<value>yarn</value>

</property>

<property>

?????????<name>mapreduce.jobhistory.address</name>

?????????<value>master01:10020</value>

</property>

<property>

?????????<name>mapreduce.jobhistory.webapp.address</name>

?????????<value>master01:19888</value>

</property>

</configuration>

2.13?編輯yarn-site.xml文件

<!--[if gte mso 9]><xml><w:WordDocument><w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel><w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery><w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery><w:DocumentKind>DocumentNotSpecified</w:DocumentKind><w:DrawingGridVerticalSpacing>7.8</w:DrawingGridVerticalSpacing><w:View>Normal</w:View><w:Compatibility></w:Compatibility><w:Zoom>0</w:Zoom></w:WordDocument></xml><![endif]-->

vi?yarn-site.xml

<configuration>

<!--?Site?specific?YARN?configuration?properties?-->

<property>

????<name>yarn.resourcemanager.resource-tracker.address</name>

????<value>master01:8031</value>

??</property>

??<property>

????<name>yarn.resourcemanager.address</name>

????<value>master01:8032</value>

??</property>

??<property>

????<name>yarn.resourcemanager.scheduler.address</name>

????<value>master01:8030</value>

??</property>

??<property>

????<name>yarn.resourcemanager.admin.address</name>

????<value>master01:8033</value>

??</property>

??<property>

????<name>yarn.resourcemanager.webapp.address</name>

????<value>master01:8088</value>

??</property>

??<property>

????<description>Classpath?for?typical?applications.</description>

????<name>yarn.application.classpath</name>

????<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,

????$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,

?$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,

????$YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*,

????$YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/*</value>

??</property>

??<property>

????<name>yarn.nodemanager.aux-services</name>

????<value>mapreduce.shuffle</value>

??</property>

??<property>

????<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

????<value>org.apache.hadoop.mapred.ShuffleHandler</value>

??</property>

??<property>

????<name>yarn.nodemanager.local-dirs</name>

????<value>/opt/data/yarn/local</value>

??</property>

??<property>

????<name>yarn.nodemanager.log-dirs</name>

????<value>/opt/data/yarn/logs</value>

??</property>

??<property>

????<description>Where?to?aggregate?logs</description>

????<name>yarn.nodemanager.remote-app-log-dir</name>

????<value>/opt/data/yarn/logs</value>

??</property>

??<property>

????<name>yarn.app.mapreduce.am.staging-dir</name>

????<value>/user</value>

?</property>

</configuration>

1.1?編輯.bashrc文件

cd?~

vi?.bashrc

#export?LANG=zh_CN.utf8

export?JAVA_HOME=/usr/java/jdk1.7.0

export?JRE_HOME=$JAVA_HOME/jre

export?CLASSPATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jar

export?HADOOP_HOME=/opt/hadoop

export?HIVE_HOME=/opt/hive

export?HBASE_HOME=/opt/hbase

export?HADOOP_MAPRED_HOME=${HADOOP_HOME}

export?HADOOP_COMMON_HOME=${HADOOP_HOME}

export?HADOOP_HDFS_HOME=${HADOOP_HOME}

export?YARN_HOME=${HADOOP_HOME}

export?HADOOP_YARN_HOME=${HADOOP_HOME}

export?HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export?HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export?YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export?PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin

source?.bashrc

1.2?將master01機(jī)上的/opt/hadoop拷貝到其他機(jī)器上

rsync?-vzrtopgu???--progress?hadoop??slave01:/opt/

rsync?-vzrtopgu???--progress?hadoop??slave02:/opt/

或者

rsync?-vzrtopgu???--progress?hadoop??192.168.2.38:/opt/

rsync?-vzrtopgu???--progress?hadoop??192.168.2.163:/opt/

&?rsync命令參數(shù)解釋

-v,?--verbose?詳細(xì)模式輸出?

-z,?--compress?對(duì)備份的文件在傳輸時(shí)進(jìn)行壓縮處理?

-r,?--recursive?對(duì)子目錄以遞歸模式處理?

-t,?--times?保持文件時(shí)間信息?

-o,?--owner?保持文件屬主信息?

-p,?--perms?保持文件權(quán)限?

-g,?--group?保持文件屬組信息?

-u,?--update?僅僅進(jìn)行更新,也就是跳過所有已經(jīng)存在于DST,并且文件時(shí)間晚于要備份的文件。(不覆蓋更新的文件)?

1.3?第一次啟動(dòng)hadoop需要先格式化NameNode

/opt/hadoop/bin/hadoop?namenode?-format

&?說明:

該操作只做一次。當(dāng)修改了配置文件時(shí),需要重新格式化?

1.4?在master01機(jī)上啟動(dòng)hdfs:

/opt/hadoop/sbin/start-dfs.sh

1.5?在master01機(jī)上啟動(dòng)mapreduce,historyserver

/opt/hadoop/sbin/start-yarn.sh

/opt/hadoop/sbin/mr-jobhistory-daemon.sh?start?historyserver

1.6?查看master01機(jī)的MapReduce

http://192.168.2.18:8088/cluster

1.7?查看slave01,slave02的節(jié)點(diǎn)

http://192.168.2.163:8042/node/node

1.8?檢查各臺(tái)機(jī)器的集群進(jìn)程

[root@master01?~]#?jps

5389?NameNode

5980?Jps

5710?ResourceManager

7032?JobHistoryServer

[root@slave01?~]#?jps

3187?Jps

3124?SecondaryNameNode

[root@slave02~]#?jps

3187?Jps

3124?DataNode

5711?NodeManager

1.9?關(guān)閉服務(wù)

/opt/hadoop/sbin/stop-all.sh

第2章?Zookeeper安裝

2.1?準(zhǔn)備安裝包

zookeeper-3.4.5-cdh4.2.0.tar.gz

2.2?解壓

tar?xzvf?zookeeper-3.4.5-cdh4.2.0.tar.gz

mv?zookeeper-3.4.5-cdh4.2.0?zookeeper

2.3?修改zoo.cfg文件

cd?conf/

cp?zoo_sample.cfg?zoo.cfg

vi?zoo.cfg

#?The?number?of?milliseconds?of?each?tick

tickTime=2000

#?The?number?of?ticks?that?the?initial

#?synchronization?phase?can?take

initLimit=10

#?The?number?of?ticks?that?can?pass?between

#?sending?a?request?and?getting?an?acknowledgement

syncLimit=5

#?the?directory?where?the?snapshot?is?stored.

#?do?not?use?/tmp?for?storage,?/tmp?here?is?just

#?example?sakes.

dataDir=/opt/zookeeper/data

#dataLogDir=/opt/zookeeper/log

#?the?port?at?which?the?clients?will?connect

clientPort=2181

#

#?Be?sure?to?read?the?maintenance?section?of?the

#?administrator?guide?before?turning?on?autopurge.

#

#?http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

#?The?number?of?snapshots?to?retain?in?dataDir

#autopurge.snapRetainCount=3

#?Purge?task?interval?in?hours

#?Set?to?"0"?to?disable?auto?purge?feature

#autopurge.purgeInterval=1

server.1=master01:2888:3888?????

server.2=master02:2888:3888?????

server.3=slave01:2888:3888

server.4=slave02:2888:3888

2.4?修改環(huán)境變量

vi?~/.bashrc

export?ZOOKEEPER_HOME=/opt/zookeeper

export?PATH=$PATH:$ZOOKEEPER_HOME/bin

2.5?創(chuàng)建data文件夾及修改myid文件

mkdir?/opt/zookeeper/data

touch?myid

vi?myid

第一臺(tái)機(jī)器寫入數(shù)字1

第二臺(tái)機(jī)器寫入數(shù)字2

依此類推

2.6?將文件復(fù)制至其他機(jī)器

rsync?-vzrtopgu???--progress?zookeeper??master02:/opt/

rsync?-vzrtopgu???--progress?zookeeper??slave01:/opt/

rsync?-vzrtopgu???--progress?zookeeper??slave02:/opt/

2.7?啟動(dòng)

sh?/opt/zookeeper/bin/zkServer.sh?start

[root@master01?zookeeper]#?jps

3459?JobHistoryServer

6259?Jps

2906?NameNode

3171?ResourceManager

6075?QuorumPeerMain

2.8?檢查是否成功

/opt/zookeeper/bin/zkCli.sh?-server?master01:2181?

或者

sh?/opt/zookeeper/bin/zkServer.sh?stop

2.9?停止服務(wù)

sh?/opt/zookeeper/bin/zkServer.sh?stop

2.10?參考文檔

http://archive.cloudera.com/cdh4/cdh/4/zookeeper-3.4.5-cdh4.2.0/

第3章?Hive的安裝

3.1?準(zhǔn)備安裝包

hive-0.10.0-cdh4.2.0???[43.2M]

mysql-connector-java-5.1.18.tar.gz???[3.65M]

3.2?準(zhǔn)備機(jī)器

slave03機(jī)器,安裝hive+thrift+sqoop,專門作為數(shù)據(jù)分析用途。

3.3?訪問mysql

和mysql整合前,請(qǐng)務(wù)必配置好各機(jī)器間能訪問Mysql服務(wù)器機(jī)?

GRANT?select,?insert,?update,?delete?ON?*.*?TO?'hadoop'@'slave01'?IDENTIFIED?BY?'hadoop';

GRANT?select,?insert,?update,?delete?ON?*.*?TO?'hadoop'@'slave01'?IDENTIFIED?BY?'hadoop';

GRANT?select,?insert,?update,?delete?ON?*.*?TO?'hadoop'@'slave01'?IDENTIFIED?BY?'hadoop';

flush?privileges;

show?grants?for?'hive'@'slave03';

revoke?all?on?*.*?from?'hadoop'@'slave01';

drop?user?'hive'@'slave03';

&?說明

測(cè)試環(huán)境下,本人仍然用slave03機(jī)做mysql服務(wù)器。在實(shí)際生產(chǎn)環(huán)境中,建議用專門的機(jī)器做Mysql。

3.4?配置hive-site.xml文件,將meta信息保存在mysql里

cd?/opt/hive

vi?hive-site.xml

<?xml?version="1.0"?>

<?xml-stylesheet?type="text/xsl"?href="configuration.xsl"?>

<configuration>

<property>

??<name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://slave03:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>?

?<description>JDBC?connect?string?for?a?JDBC?metastore</description>

</property>

<property>

??<name>javax.jdo.option.ConnectionDriverName</name>

??<value>com.mysql.jdbc.Driver</value>

??<description>Driver?class?name?for?a?JDBC?metastore</description>

</property>

<property>

??<name>javax.jdo.option.ConnectionUserName</name>

??<value>hadoop</value>

??<description>username?to?use?against?metastore?database</description>

</property>

<property>

??<name>javax.jdo.option.ConnectionPassword</name>

??<value>hadoop</value>

??<description>password?to?use?against?metastore?database</description>

</property>

<property>

?<name>mapred.job.tracker</name>

?<value>master01:8031</value>

</property>

<property>

?<name>mapreduce.framework.name</name>

?<value>yarn</value>

</property>

<property>

??<name>hive.metastore.warehouse.dir</name>

??<value>/opt/data/warehouse-${user.name}</value>

??<description>location?of?default?database?for?the?warehouse</description>

</property>

<property>

??<name>hive.exec.scratchdir</name>

??<value>/opt/data/hive-${user.name}</value>

??<description>Scratch?space?for?Hive?jobs</description>

</property>

<property>

??<name>hive.querylog.location</name>

??<value>/opt/data/querylog-${user.name}</value>

??<description>

????Location?of?Hive?run?time?structured?log?file

??</description>

</property>

<property>

??<name>hive.support.concurrency</name>

??<description>Enable?Hive's?Table?Lock?Manager?Service</description>

??<value>false</value>

</property>

<property>

??<name>hive.hwi.listen.host</name>

??<value>master01</value>

??<description>This?is?the?host?address?the?Hive?Web?Interface?will?listen?on</description>

</property>

<property>

??<name>hive.hwi.listen.port</name>

??<value>9999</value>

??<description>This?is?the?port?the?Hive?Web?Interface?will?listen?on</description>

</property>

<property>

??<name>hive.hwi.war.file</name>

??<value>lib/hive-hwi-0.10.0-cdh4.2.0.war</value>

??<description>This?is?the?WAR?file?with?the?jsp?content?for?Hive?Web?Interface</description>

</property>

</configuration>

3.5?將mysql-connector-java-5.1.18.tar.gz解壓

tar?xzvf?mysql-connector-java-5.1.18.tar.gz

mv?mysql-connector-java-5.1.18-bin.jar?/opt/hive/lib

3.6?Mysql的一些操作

create?database?hive;

alter?database?hive?character?set?latin1;

&?注意:

如果不設(shè)置上述命令,則會(huì)出現(xiàn)如下:

Specified?key?was?too?long;?max?key?length?is?767?bytes

3.7?查看日志記錄

tail?/tmp/root/hive.log

3.8?Hive導(dǎo)入本地?cái)?shù)據(jù)命令

1)?CREATE?TABLE?mytest2(num?INT,?name?STRING)??COMMENT?'only?a?test'???????????????????????????ROW?FORMAT?DELIMITED?FIELDS?TERMINATED?BY?'\t'?STORED?AS?TEXTFILE;???

2)?LOAD?DATA?LOCAL?INPATH?'/var/22.txt'?INTO?TABLE?mytest2;???

第4章?Hive+Thrift+PHP整合

4.1?準(zhǔn)備安裝包

Thrift.zip????[71.7K]??下載URL:http://download.csdn.net/detail/jiedushi/3409880

PHP安裝,略過

4.2?編輯代碼

vi?test.php

<?php

????$GLOBALS['THRIFT_ROOT']?=?'/home/wwwroot/Thrift/';

????require_once?$GLOBALS['THRIFT_ROOT']?.?'packages/hive_service/ThriftHive.php';

????require_once?$GLOBALS['THRIFT_ROOT']?.?'transport/TSocket.php';

????require_once?$GLOBALS['THRIFT_ROOT']?.?'protocol/TBinaryProtocol.php';

?

????$transport?=?new?TSocket('slave03',?10000);

????$protocol?=?new?TBinaryProtocol($transport);

????$client?=?new?ThriftHiveClient($protocol);

????$transport->open();

?

????#$client->execute('add?jar?/opt/hive/lib/hive-contrib-0.10.0-cdh4.2.0.jar?');

????$client->execute("LOAD?DATA?LOCAL?INPATH?'/var/22.txt'?INTO?TABLE?mytest2");

????$client->execute("SELECT?COUNT(1)?FROM?mytest2");

????var_dump($client->fetchAll());

????$transport->close();

?>

&?說明:

/var/22.txt文件內(nèi)容為:

1???????jj

2???????kk

與上一章2.5的操作同步

4.3?啟動(dòng)hiveserver

/opt/hive/bin/hive?--service?hiveserver?>/dev/null?2>/dev/null?&

4.4?查看默認(rèn)開啟的10000端口

netstat?-lntp|grep?10000

4.5?測(cè)試

php?test.php

4.6?出錯(cuò)提示及解決辦法

?0?1?Warning:?stream_set_timeout():?supplied?argument?is?not?a?valid?stream?resource?in?/home/wwwroot/Thrift/transport/TSocket.php?on?line?213

修改php.ini中的disable_functions

disable_functions?=?passthru,exec,system,chroot,scandir,chgrp,chown,shell_exec,proc_get_status,ini_alter,ini_alter,ini_restore,dl,openlog,syslog,readlink,symlink,popepassthru

第5章?sqoop安裝使用

5.1?準(zhǔn)備安裝包

sqoop-1.4.2-cdh4.2.0.tar.gz?????[6M]

5.2?前提工作

按第一章的介紹步驟配置好hadoop,環(huán)境變量HADOOP_HOME已經(jīng)設(shè)置好。

5.3?安裝

cd?/opt/

tar?xzvf?sqoop-1.4.2-cdh4.2.0.tar

mv?sqoop-1.4.2-cdh4.2.0?sqoop

5.4?放置mysql驅(qū)動(dòng)包

將mysql-connector-java-5.1.18-bin.jar包放至/opt/sqoop/lib下

5.5?修改configure-sqoop文件

vi?/opt/sqoop/bin/configure-sqoop

因?yàn)闆]安裝hbase,請(qǐng)注釋

#if?[?!?-d?"${HBASE_HOME}"?];?then

#??echo?"Warning:?$HBASE_HOME?does?not?exist!?HBase?imports?will?fail."

#??echo?'Please?set?$HBASE_HOME?to?the?root?of?your?HBase?installation.'

#fi

5.6?將路徑加入PATH

vi?~/.bashrc

export?PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ANT_HOME/bin:/opt/sqoop/bin

5.7?使用測(cè)試

?0?1?列出mysql數(shù)據(jù)庫中的所有數(shù)據(jù)庫命令

sqoop?list-databases?--connect?jdbc:mysql://slave03:3306/?--username?hadoop?--password?hadoop

?0?1?列出表名:

sqoop?list-tables?-connect?jdbc:mysql://slave03/ggg?-username?hadoop?-password?hadoop

?0?1?將關(guān)系型數(shù)據(jù)的表結(jié)構(gòu)復(fù)制到hive中

sqoop?create-hive-table?--connect?jdbc:mysql://master01:3306/ggg?--table?hheccc_area?--username?hadoop?--password?hadoop?--hive-table?ggg_hheccc_area

?0?1?從關(guān)系數(shù)據(jù)庫導(dǎo)入文件到hive中

sqoop?import?-connect?jdbc:mysql://slave03/ggg?-username?hadoop?-password?hadoop?-table?sp_log_fee?-hive-import?--hive-table?hive_log_fee?--split-by?id?-m?4

&?參照

一般導(dǎo)入:?

import?\
???????--append?\
???????--connect?$DS_BJ_HOTBACKUP_URL?\
???????--username?$DS_BJ_HOTBACKUP_USER?\
???????--password?$DS_BJ_HOTBACKUP_PWD?\
???????--table?'seven_book_sync'?\
???????--where?"create_date?>=?'${par_31days}'?and?create_date?<?'${end_date}'"?\
???????--hive-import?\
???????--hive-drop-import-delims?\
???????--hive-table?${hive_table}?\????????//可以點(diǎn)分法識(shí)別schema.table
???????--m?1

以時(shí)間作為增量條件是最好的辦法

并行導(dǎo)入:

sqoop?import?--append?--connect?$CONNECTURL?--username?$ORACLENAME?--password?$ORACLEPASSWORD?--target-dir?$hdfsPath??--m?12?--split-by?CLIENTIP?--table?$oralceTableName?--columns?$columns?--fields-terminated-by?'\001'??--where?"data_desc='2011-02-26'"?

增量導(dǎo)入:

sqoop?import???--connect?jdbc:mysql://master01:3306/ggg?--username?hadoop?--password?hadoop?--table?hheccc_area?--columns?"id,name,reid,disorder"?--direct?--hive-import???--hive-table?hheccc_area?--incremental?append??--check-column?id?--last-value?0

sqoop?job?--exec?area_import

以上為網(wǎng)上找來的命令,經(jīng)測(cè)試,不起作用。留著僅供參考。

?0?1?將hive中的表數(shù)據(jù)導(dǎo)出到mysql中

sqoop?export?--connect?jdbc:mysql://master01:3306/ggg?--username?hadoop?--password?hadoop?--table?mytest2?--export-dir?/opt/data/warehouse-root/ggg_hheccc_area

&?備注

分區(qū)保存:/user/hive/warehouse/uv/dt=2011-08-03

5.8?出錯(cuò)提示及解決辦法

?0?1?Encountered?IOException?running?import?job:?org.apache.hadoop.fs.FileAlreadyExistsException:?Output?directory?hdfs://master01/user/root/hheccc_area?already?exists

/opt/hadoop/bin/hadoop?fs?-rm?-r?/user/root/hheccc_area

5.9?參考

http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html

http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

  • hadoop2.0.0-CDH4.2.0系列手工安裝指南.rar?(822 KB)
  • 下載次數(shù): 60
  • 查看圖片附件

總結(jié)

以上是生活随笔為你收集整理的Hadoop详细配置的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。