日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > c/c++ >内容正文

c/c++

hadoop-HA集群搭建,启动DataNode,检测启动状态,执行HDFS命令,启动YARN,HDFS权限配置,C++客户端编程,常见错误

發(fā)布時間:2024/9/27 c/c++ 18 豆豆
生活随笔 收集整理的這篇文章主要介紹了 hadoop-HA集群搭建,启动DataNode,检测启动状态,执行HDFS命令,启动YARN,HDFS权限配置,C++客户端编程,常见错误 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

本篇博文為整理網(wǎng)絡(luò)上Hadoop-HA搭建后出來的博客,參考網(wǎng)址為:http://blog.chinaunix.net/uid-196700-id-5751309.html

3.?部署

3.1.?機器列表

共5臺機器(zookeeper部署在這5臺機器上),部署如下表所示:

?

NameNode

JournalNode

DataNode

ZooKeeper

192.168.106.91

192.168.106.92

192.168.106.91

192.168.106.92

192.168.106.93

192.168.106.93

192.168.106.94

192.168.106.95

192.168.106.101

192.168.106.102

192.168.106.103

?

3.2.?主機名

?

?

?

機器IP

對應(yīng)的主機名

192.168.106.91

hadoop1

192.168.106.92

hadoop2

192.168.106.93

hadoop3

192.168.106.94

hadoop4

192.168.106.95

hadoop5

192.168.106.101

hadoop11

192.168.106.102

hadoop12

192.168.106.103

hadoop13

?

?

?

?

?

注意主機名不能有下劃線,否則啟動時,SecondaryNameNode節(jié)點會報錯

3.2.2.?永久修改主機名(注意:這一步一定要做)

不同的Linux發(fā)行版本,對應(yīng)的系統(tǒng)配置文件可能不同,Centos6.7通過/etc/hosts:

[root@hadoop1 hadoop]# hostname

hadoop1

?

查看主機配置信息cat /etc/hosts:

[root@hadoop1 hadoop]# cat /etc/hosts

192.168.106.91????? hadoop1

192.168.106.92????? hadoop2

192.168.106.93????? hadoop3

192.168.106.94????? hadoop4

192.168.106.95????? hadoop5

192.168.106.101????? hadoop11

192.168.106.102????? hadoop12

192.168.106.103????? hadoop13

關(guān)于虛擬機克隆,hostname修改,ip修改,參考博文:

http://blog.csdn.net/tototuzuoquan/article/details/53999173

?

win10下虛擬機聯(lián)網(wǎng)問題,參考:

http://blog.csdn.net/tototuzuoquan/article/details/53900836

?

3.3.?免密碼登錄范圍

要求能通過免登錄包括使用IP和主機名都能免密碼登錄:

1)?NameNode能免密碼登錄所有的DataNode

2)?各NameNode能免密碼登錄自己

3)?各NameNode間能免密碼互登錄

4)?DataNode能免密碼登錄自己

5)?DataNode不需要配置免密碼登錄NameNode和其它DataNode。

?

注:免密碼登錄不是必須的,如果不使用hadoop-daemons.sh等需要ssh、scp的腳本。

?

4.?免密碼ssh2登錄

?

以下的免密登錄方案是針對Centos6.7上情況:

?

hadoop1下:

ssh-keygen -t rsa? (直接打Enter,直到執(zhí)行完畢)

然后執(zhí)行:

ssh-copy-id hadoop1

ssh-copy-id hadoop2

ssh-copy-id hadoop3

ssh-copy-id hadoop4

ssh-copy-id hadoop5

通過上面的配置,可以在hadoop1上免密登錄hadoop1、hadoop2、hadoop3、hadoop4、hadoop5

?

hadoop2下:

ssh-keygen -t rsa? (直接打Enter,直到執(zhí)行完畢)

然后執(zhí)行:

ssh-copy-id hadoop1

ssh-copy-id hadoop2

ssh-copy-id hadoop3

ssh-copy-id hadoop4

ssh-copy-id hadoop5

通過上面的配置,可以在hadoop1上免密登錄hadoop1、hadoop2、hadoop3、hadoop4、hadoop5

?

hadoop3下:

ssh-keygen -t rsa? (直接打Enter,直到執(zhí)行完畢)

然后執(zhí)行:

?

ssh-copy-id hadoop1

ssh-copy-id hadoop2

ssh-copy-id hadoop3

ssh-copy-id hadoop4

ssh-copy-id hadoop5

?

hadoop4下:

ssh-keygen -t rsa? (直接打Enter,直到執(zhí)行完畢)

然后執(zhí)行:

?

ssh-copy-id hadoop1

ssh-copy-id hadoop2

ssh-copy-id hadoop3

ssh-copy-id hadoop4

ssh-copy-id hadoop5

?

hadoop5下:

ssh-keygen -t rsa? (直接打Enter,直到執(zhí)行完畢)

然后執(zhí)行:

?

ssh-copy-id hadoop1

ssh-copy-id hadoop2

ssh-copy-id hadoop3

ssh-copy-id hadoop4

ssh-copy-id hadoop5

?

?

?? 下面是其它關(guān)于免密登錄的資料

建議生成的私鑰和公鑰文件名都帶上自己的IP,否則會有些混亂。

按照中免密碼登錄范圍的說明,配置好所有的免密碼登錄。更多關(guān)于免密碼登錄說明,請瀏覽技術(shù)博客:

1)?http://blog.chinaunix.net/uid-20682147-id-4212099.html(兩個SSH2間免密碼登錄)

2)?http://blog.chinaunix.net/uid-20682147-id-4212097.html(SSH2免密碼登錄OpenSSH)

3)?http://blog.chinaunix.net/uid-20682147-id-4212094.html(OpenSSH免密碼登錄SSH2)

4)?http://blog.chinaunix.net/uid-20682147-id-5520240.html(兩個openssh間免密碼登錄)

?

?

?

?

5.?約定

5.1.?安裝目錄約定

為便于講解,本文約定Hadoop、JDK安裝目錄如下:

Jdk

/usr/local/jdk1.8.0_73

hadoop

/home/tuzq/software/hadoop-2.8.0

在實際安裝部署時,可以根據(jù)實際進(jìn)行修改。

6.?工作詳單

為運行Hadoop(HDFS、YARN和MapReduce)需要完成的工作詳單:

JDK安裝

Hadoop是Java語言開發(fā)的,所以需要。

免密碼登錄

NameNode控制SecondaryNameNode和DataNode使用了ssh和scp命令,需要無密碼執(zhí)行。

Hadoop安裝和配置

這里指的是HDFS、YARN和MapReduce,不包含HBase、Hive等的安裝。

7.?JDK安裝

本文安裝的JDK1.8.0_73版本。關(guān)于JDK的安裝,參考:http://blog.csdn.net/tototuzuoquan/article/details/18188109

7.1.?下載安裝包

此處略。

7.2.?安裝步驟

最后配置的java的環(huán)境變量是:

export?JAVA_HOME=/usr/local/jdk1.8.0_73

export?CLASSPATH=$JAVA_HOME/lib/tools.jar

export?PATH=$JAVA_HOME/bin:$PATH

?

完成這項操作之后,需要重新登錄,或source一下profile文件,以便環(huán)境變量生效,當(dāng)然也可以手工運行一下,以即時生效。如果還不放心,可以運行下java或javac,看看命令是否可執(zhí)行。如果在安裝JDK之前,已經(jīng)可執(zhí)行了,則表示不用安裝JDK。

?

?

?

?

8.?Hadoop安裝和配置

本部分僅包括HDFS、MapReduce和Yarn的安裝,不包括HBase、Hive等的安裝。

8.1.?下載安裝包

? ?此處略,直接進(jìn)入官網(wǎng),下載hadoop-2.8.0.tar.gz。關(guān)于源碼編譯的可以參考:

? 源碼編譯:

? http://blog.csdn.net/tototuzuoquan/article/details/72796632

?hadoop偽分布式集群安裝:

? http://blog.csdn.net/tototuzuoquan/article/details/72798435

?

8.2.?安裝和環(huán)境變量配置

1)?將下載好的hadoop安裝包hadoop-2.8.0.tar.gz上傳到/home/tuzq/software目錄下

2)?進(jìn)入/home/tuzq/software目錄

3)?在/home/tuzq/software目錄下,解壓安裝包hadoop-2.8.0.tar.gz:tar?xzf?hadoop-2.8.0.tar.gz,將自己在Linux上編譯好的hadoop中的lib/native替換剛剛解壓好的hadoop-2.8.0/lib/native中的內(nèi)容。

4)?修改用戶主目錄下的文件.profile(當(dāng)然也可以是/etc/profile或其它同等效果的文件),設(shè)置Hadoop環(huán)境變量:

export?JAVA_HOME=/usr/local/jdk1.8.0_73

export?HADOOP_HOME=/home/tuzq/software/hadoop-2.8.0

export?HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export?PATH=$PATH:$HADOOP_HOME/bin

?

需要重新登錄以生效,或者在終端上執(zhí)行source /etc/profile,讓export?HADOOP_HOME=/home/tuzq/software/hadoop-2.8.0即時生效。

8.3.?修改hadoop-env.sh(hadoop1機器上為例)

修改所有節(jié)點上的$HADOOP_HOME/etc/hadoop/hadoop-env.sh文件,在靠近文件頭部分加入:export?JAVA_HOME=/usr/local/jdk1.8.0_73

?

特別說明一下:雖然在/etc/profile已經(jīng)添加了JAVA_HOME,但仍然得修改所有節(jié)點上的hadoop-env.sh,否則啟動時,報如下所示的錯誤:

ip:?Error:?JAVA_HOME?is?not?set?and?could?not?be?found.

ip:?Error:?JAVA_HOME?is?not?set?and?could?not?be?found.

ip:?Error:?JAVA_HOME?is?not?set?and?could?not?be?found.

ip:?Error:?JAVA_HOME?is?not?set?and?could?not?be?found.

ip:?Error:?JAVA_HOME?is?not?set?and?could?not?be?found.

ip:?Error:?JAVA_HOME?is?not?set?and?could?not?be?found.

?

除JAVA_HOME之外,再添加:

export?HADOOP_HOME=/home/tuzq/software/hadoop-2.8.0

export?HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

?

效果如下:

?

同時,建議將下列添加到/etc/profile或~/.profile中:

export?JAVA_HOME=/usr/local/jdk1.8.0_73

export?HADOOP_HOME=/home/tuzq/software/hadoop-2.8.0

export?HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

8.5.?修改slaves(hadoop1機器上為例)

slaves即為HDFS的DataNode節(jié)點。當(dāng)使用腳本start-dfs.sh來啟動hdfs時,會使用到這個文件,以無密碼登錄方式到各slaves上啟動DataNode。

修改主NameNode和備NameNode上的$HADOOP_HOME/etc/hadoop/slaves文件,將slaves的節(jié)點IP(也可以是相應(yīng)的主機名)一個個加進(jìn)去,一行一個IP,如下所示:

>cat?slaves

hadoop3

hadoop4

hadoop5

8.6.?準(zhǔn)備好各配置文件

配置文件放在$HADOOP_HOME/etc/hadoop目錄下,對于Hadoop?2.3.0、Hadoop?2.7.2和Hadoop?2.7.2版本,該目錄下的core-site.xml、yarn-site.xml、hdfs-site.xml和mapred-site.xml都是空的。如果不配置好就啟動,如執(zhí)行start-dfs.sh,則會遇到各種錯誤。

可從$HADOOP_HOME/share/hadoop目錄下拷貝一份到/etc/hadoop目錄,然后在此基礎(chǔ)上進(jìn)行修改(以下內(nèi)容可以直接拷貝執(zhí)行,2.3.0版本中各default.xml文件路徑不同于2.7.2版本):

#?進(jìn)入$HADOOP_HOME目錄

cd?$HADOOP_HOME

cp?./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml?./etc/hadoop/core-site.xml

cp?./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml?./etc/hadoop/hdfs-site.xml

cp?./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml?./etc/hadoop/yarn-site.xml

cp?./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml?./etc/hadoop/mapred-site.xml

?

接下來,需要對默認(rèn)的core-site.xml、yarn-site.xml、hdfs-site.xml和mapred-site.xml進(jìn)行適當(dāng)?shù)男薷?#xff0c;否則仍然無法啟動成功。

?

QJM的配置參照的官方文檔:

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

?

8.7.?修改core-site.xml

對core-site.xml文件的修改,涉及下表中的屬性:

?

屬性名

屬性值

說明

fs.defaultFS

hdfs://mycluster

?

fs.default.name

hdfs://mycluster

按理應(yīng)當(dāng)不用填寫這個參數(shù),因為fs.defaultFS已取代它,但啟動時報錯:

fs.defaultFS?is?file:///

hadoop.tmp.dir

/home/tuzq/software/hadoop-2.8.0/tmp

?

ha.zookeeper.quorum

hadoop11:2181,hadoop12:2181,hadoop13:2181

?

ha.zookeeper.parent-znode

/mycluster/hadoop-ha

?

io.seqfile.local.dir

?

默認(rèn)值為${hadoop.tmp.dir}/io/local

fs.s3.buffer.dir

?

默認(rèn)值為${hadoop.tmp.dir}/s3

fs.s3a.buffer.dir

?

默認(rèn)值為${hadoop.tmp.dir}/s3a

?

?

?

?? 實際部署的時候的一個參考配置文件如下:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

?

<configuration>

??? <property>

??????? <name>fs.defaultFS</name>

??????? <value>hdfs://mycluster</value>

??? </property>

??? <property>

??????? <name>fs.default.name</name>

??????? <value>hdfs://mycluster</value>

??? </property>

??? <property>

??????? <name>hadoop.tmp.dir</name>

??????? <value>/home/tuzq/software/hadoop-2.8.0/tmp</value>

??? </property>

?

??? <property>

??????? <name>ha.zookeeper.quorum</name>

??????? <value>hadoop11:2181,hadoop12:2181,hadoop13:2181</value>

??? </property>

?

??? <property>

??????? <name>ha.zookeeper.parent-znode</name>

??????? <value>/mycluster/hadoop-ha</value>

??? </property>

</configuration>

?

注意啟動之前,需要將配置的目錄創(chuàng)建好,如創(chuàng)建好/home/tuzq/software/current/tmp目錄。詳細(xì)可參考:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xm

8.7.1.?dfs.namenode.rpc-address

如果沒有配置,則啟動時報如下錯誤:

Incorrect?configuration:?namenode?address?dfs.namenode.servicerpc-address?or?dfs.namenode.rpc-address?is?not?configured.

?

這里需要指定IP和端口,如果只指定了IP,如192.168.106.91,則啟動時輸出如下:

Starting?namenodes?on?[]

?

改成“hadoop1:8020”后,則啟動時輸出為:

Starting?namenodes?on?[192.168.106.91]

?

8.8.?修改hdfs-site.xml

對hdfs-site.xml文件的修改,涉及下表中的屬性:

屬性名

屬性值

說明

dfs.nameservices

mycluster

?

dfs.ha.namenodes.mycluster

nn1,nn2

同一個nameservice下,只能配置一個或兩個,也就是說不能有nn3

dfs.namenode.rpc-address.mycluster.nn1

hadoop1:8020

?

dfs.namenode.rpc-address.mycluster.nn2

Hadoop2:8020

?

dfs.namenode.http-address.mycluster.nn1

hadoop1:50070

?

dfs.namenode.http-address.mycluster.nn2

hadoop2:50070

?

dfs.namenode.shared.edits.dir

qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster

至少三臺Quorum?Journal節(jié)點配置

?

?

?

dfs.client.failover.proxy.provider.mycluster

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

客戶端通過它來找NameNode

?

?

?

dfs.ha.fencing.methods

sshfence

?

如果配置為sshfence,當(dāng)主NameNode異常時,使用ssh登錄到主NameNode,然后使用fuser將主NameNode殺死,因此需要確保所有NameNode上可以使用fuser。

用來保證同一時刻只有一個主NameNode,以防止腦裂。可帶用戶名和端口參數(shù),格式示例:sshfence([[username][:port]]);值還可以為shell腳本,格式示例:

shell(/path/to/my/script.sh?arg1?arg2?...),如:

shell(/bin/true)

dfs.ha.fencing.ssh.private-key-files

/root/.ssh/id_rsa

指定私鑰,如果是OpenSSL,則值為/root/.ssh/id_rsa

dfs.ha.fencing.ssh.connect-timeout

30000

可選的配置

dfs.journalnode.edits.dir

/home/tuzq/software/hadoop-2.8.0/journal

JournalNode存儲其本地狀態(tài)的位置,在JournalNode機器上的絕對路徑,JNs的edits和其它本地狀態(tài)被存儲在此處

dfs.datanode.data.dir

/home/tuzq/software/hadoop-2.8.0/data/data

?

dfs.namenode.name.dir

/home/tuzq/software/hadoop-2.8.0/data/name

NameNode元數(shù)據(jù)存放目錄,默認(rèn)值為file://${hadoop.tmp.dir}/dfs/name,也就是在臨時目錄下,可以考慮放到數(shù)據(jù)目錄下

dfs.namenode.checkpoint.dir

?

默認(rèn)值為file://${hadoop.tmp.dir}/dfs/namesecondary,但如果沒有啟用SecondaryNameNode,則不需要

dfs.ha.automatic-failover.enabled

true

自動主備切換

?

?

?

dfs.datanode.max.xcievers

4096

可選修改,類似于linux的最大可打開的文件個數(shù),默認(rèn)為256,建議設(shè)置成大一點。同時,需要保證系統(tǒng)可打開的文件個數(shù)足夠(可通過ulimit命令查看)。該錯誤會導(dǎo)致hbase報“notservingregionexception”。

dfs.journalnode.rpc-address

0.0.0.0:8485

配置JournalNode的RPC端口號,默認(rèn)為0.0.0.0:8485,可以不用修改

?

詳細(xì)配置可參考:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

?

實際部署的時候的一個參考配置文件:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

?

<configuration>

??? <property>

??????? <name>dfs.nameservices</name>

??????? <value>mycluster</value>

??? </property>

??? <!--同一nameservice下,只能配置一或兩個NameNode,也就是說不能有nn3,分別是nn1,nn2-->

??? <property>

??????? <name>dfs.ha.namenodes.mycluster</name>

??????? <value>nn1,nn2</value>

??? </property>

??? <!-- nn1的RPC通信地址 -->

??? <property>

??????? <name>dfs.namenode.rpc-address.mycluster.nn1</name>

??????? <value>hadoop1:8020</value>

??? </property>

??? <!--nn1的http通信地址-->

??? <property>

??????? <name>dfs.namenode.http-address.mycluster.nn1</name>

??????? <value>hadoop1:50070</value>

??? </property>

??? <!--nn2的RPC通信地址-->

??? <property>

??????? <name>dfs.namenode.rpc-address.mycluster.nn2</name>

??????? <value>hadoop2:8020</value>

??? </property>

??? <!--nn2的http通信地址-->

??? <property>

??????? <name>dfs.namenode.http-address.mycluster.nn2</name>

??????? <value>hadoop2:50070</value>

??? </property>

???

??? <!--指定NameNode的edits元數(shù)據(jù)在JournalNode上的存放位置,這也是一個集群,至少3臺Quorum Journal節(jié)點配置-->

??? <property>

??????? <name>dfs.namenode.shared.edits.dir</name>

??????? <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster</value>

??? </property>

???

??? <!--

??????? JournalNode存儲其本地狀態(tài)的位置,在JournalNode機器上的絕對路徑,JNs的edits

??????? 和其他本地狀態(tài)將被存儲在此處

??? -->

??? <property>

??????? <name>dfs.journalnode.edits.dir</name>

??????? <value>/home/tuzq/software/hadoop-2.8.0/journal</value>

??? </property>

???

??? <!--配置JournalNode的RPC端口號,默認(rèn)為0.0.0.0:8485,可以不用修改-->

??? <!--

??? <property>

??????? <name>dfs.journalnode.rpc-address</name>

??????? <property>0.0.0.0:8485</property>

??? </property>

??? -->

???

??? <!--開啟NameNode失敗自動切換,自動主備切換-->

??? <property>

??????? <name>dfs.ha.automatic-failover.enabled</name>

??????? <value>true</value>

??? </property>

???

??? <!--

???????? 配置失敗自動切換實現(xiàn)方式,切換的時候用哪種控制器,不同的名稱服務(wù)可以有不同的自動切換方式,

???????? 客戶端通過它來找主NameNode

??? -->

??? <property>

??????? <name>dfs.client.failover.proxy.provider.mycluster</name>

??????? <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

??? </property>

???

??? <!--

?????? 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行

??????

?????? 如果配置為sshfence,當(dāng)主NameNode異常時,使用ssh登錄到主NameNode,然后使用fuser將主NameNode殺死,因此需要確保所有NameNode

?????? 上可以使用fuser;

?

?????? 用來保證同一時刻只有一個主NameNode,以防止腦裂。可帶用戶名和端口參數(shù),格式示例:sshfence([[username][:port]]);值

?????? 還可以為shell腳本,格式示例:shell(/bin/true),如果sshd不是默認(rèn)的22端口時,就需要指定。

??? -->

??? <property>

??????? <name>dfs.ha.fencing.methods</name>

??????? <value>

????????????? sshfence

????????????? shell(/bin/true)

??????? </value>

??? </property>

???

??? <!-- 使用sshfence隔離機制時需要ssh免登陸,指定私鑰,下面是OpenSSL -->

??? <property>

??????? <name>dfs.ha.fencing.ssh.private-key-files</name>

??????? <value>/root/.ssh/id_rsa</value>

??? </property>

???

??? <!-- 配置sshfence隔離機制超時時間 -->

??? <property>

??????? <name>dfs.ha.fencing.ssh.connect-timeout</name>

??????? <value>30000</value>

??? </property>

?

??? <!--

???????? NameNode元數(shù)據(jù)存放目錄,默認(rèn)值為file://${hadoop.tmp.dir}/dfs/name,

???????? 也就是在臨時目錄下,可以考慮放到數(shù)據(jù)目錄下

??? -->

??? <property>

??????? <name>dfs.datanode.data.dir</name>

??????? <value>/home/tuzq/software/hadoop-2.8.0/data/data</value>

??? </property>

?

??? <property>

??????? <name>dfs.namenode.name.dir</name>

??????? <value>/home/tuzq/software/hadoop-2.8.0/data/name</value>

??? </property>

?

??? <!--

?????? 可選修改,類似于Linux的最大可打開的文件個數(shù),默認(rèn)為256,建議設(shè)置成大一點。同時,

?????? 需要保證系統(tǒng)可打開的文件個數(shù)足夠(可通過ulimit命令查看),該錯誤會導(dǎo)致hbase報

?????? "notservingregionexception"

??? -->

??? <property>

??????? <name>dfs.datanode.max.xcievers</name>

??????? <value>4096</value>

??? </property>

???

</configuration>

?

8.9.?修改mapred-site.xml

對hdfs-site.xml文件的修改,涉及下表中的屬性:

屬性名

屬性值

涉及范圍

mapreduce.framework.name

yarn

所有mapreduce節(jié)點

?

實際部署中的一個參考配置如下:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

?

<configuration>

??? <property>

??????? <name>mapreduce.framework.name</name>

??????? <value>yarn</value>

??? </property>

</configuration>

?

詳細(xì)配置可參考:

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

8.10.?修改yarn-site.xml

對yarn-site.xml文件的修改,涉及下表中的屬性:

屬性名

屬性值

涉及范圍

yarn.resourcemanager.hostname

0.0.0.0

ResourceManager

NodeManager

HA模式可不配置,但由于其它配置項可能有引用它,建議保持值為0.0.0.0,如果沒有被引用到,則可不配置。

yarn.nodemanager.hostname

0.0.0.0

?

yarn.nodemanager.aux-services

mapreduce_shuffle

?

以下為HA相關(guān)的配置,包括自動切換(可僅可在ResourceManager節(jié)點上配置

yarn.resourcemanager.ha.enabled

true

啟用HA

yarn.resourcemanager.cluster-id

yarn-cluster

可不同于HDFS

yarn.resourcemanager.ha.rm-ids

rm1,rm2

注意NodeManager要和ResourceManager一樣配置

yarn.resourcemanager.hostname.rm1

hadoop1

?

yarn.resourcemanager.hostname.rm2

hadoop2

?

yarn.resourcemanager.webapp.address.rm1

hadoop1:8088

在瀏覽器上訪問:http://hadoop1:8088,可以看到y(tǒng)arn的信息

yarn.resourcemanager.webapp.address.rm2

hadoop2:8088

在瀏覽器上訪問:http://hadoop2:8088,可以看到y(tǒng)arn的信息

yarn.resourcemanager.zk-address

hadoop11:2181,hadoop12:2182,hadoop13:2181

?

yarn.resourcemanager.ha.automatic-failover.enable

true

可不配置,因為當(dāng)yarn.resourcemanager.ha.enabled為true時,它的默認(rèn)值即為true

以下為NodeManager配置

yarn.nodemanager.vmem-pmem-ratio

?

每使用1MB物理內(nèi)存,最多可用的虛擬內(nèi)存數(shù),默認(rèn)值為2.1,在運行spark-sql時如果遇到“Yarn?application?has?already?exited?with?state?FINISHED”,則應(yīng)當(dāng)檢查NodeManager的日志,以查看是否該配置偏小原因

yarn.nodemanager.resource.cpu-vcores

?

NodeManager總的可用虛擬CPU個數(shù),默認(rèn)值為8

yarn.nodemanager.resource.memory-mb

?

該節(jié)點上YARN可使用的物理內(nèi)存總量,默認(rèn)是8192(MB)

yarn.nodemanager.pmem-check-enabled

?

是否啟動一個線程檢查每個任務(wù)正使用的物理內(nèi)存量,如果任務(wù)超出分配值,則直接將其殺掉,默認(rèn)是true

yarn.nodemanager.vmem-check-enabled

?

是否啟動一個線程檢查每個任務(wù)正使用的虛擬內(nèi)存量,如果任務(wù)超出分配值,則直接將其殺掉,默認(rèn)是true

以下為ResourceManager配置

yarn.scheduler.minimum-allocation-mb

?

單個容器可申請的最小內(nèi)存

yarn.scheduler.maximum-allocation-mb

?

單個容器可申請的最大內(nèi)存

?

?? 實際部署的時候一個參考配置:

<?xml version="1.0"?>

?

<configuration>

??? <!--啟用HA-->

??? <property>

??????? <name>yarn.resourcemanager.ha.enabled</name>

??????? <value>true</value>

??? </property>

???

??? <!--指定RM的cluster id-->

??? <property>

??????? <name>yarn.resourcemanager.cluster-id</name>

??????? <value>yarn-cluster</value>

??? </property>

???

??? <!-- 指定RM的名字 -->???

??? <property>

??????? <name>yarn.resourcemanager.ha.rm-ids</name>

??????? <value>rm1,rm2</value>

??? </property>??

???

??? <!--分別指定RM的地址-->

??? <property>

??????? <name>yarn.resourcemanager.hostname.rm1</name>

??????? <value>hadoop1</value>

??? </property>

??? <property>

??????? <name>yarn.resourcemanager.hostname.rm2</name>

??????? <value>hadoop2</value>

??? </property>

??? <property>

??????? <name>yarn.resourcemanager.webapp.address.rm1</name>

??????? <value>hadoop1:8088</value>

??? </property>

??? <property>

??????? <name>yarn.resourcemanager.webapp.address.rm2</name>

??????? <value>hadoop2:8088</value>

??? </property>

??? <!--指定zk集群地址-->???

??? <property>

??????? <name>yarn.resourcemanager.zk-address</name>

??????? <value>hadoop11:2181,hadoop12:2182,hadoop13:2181</value>

??? </property>

???

??? <!-- yarn中的nodemanager是否要提供一些輔助的服務(wù) -->?

??? <property>

??? <name>yarn.nodemanager.aux-services</name>

??? <value>mapreduce_shuffle</value>

??? </property>

?

</configuration>

?

yarn.nodemanager.hostname如果配置成具體的IP,則會導(dǎo)致每個NodeManager的配置不同。詳細(xì)配置可參考:

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Yarn?HA的配置可以參考:

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

?

在hadoop1上配置完成后執(zhí)行:

?

scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop2:/home/tuzq/software/hadoop-2.8.0/etc/hadoop

scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop3:/home/tuzq/software/hadoop-2.8.0/etc/hadoop

scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop4:/home/tuzq/software/hadoop-2.8.0/etc/hadoop

scp -r /home/toto/software/hadoop-2.8.0/etc/hadoop/* root@hadoop5:/home/tuzq/software/hadoop-2.8.0/etc/hadoop

?

?

?

?

9.?啟動順序

Zookeeper?->?JournalNode?->?格式化NameNode?->?初始化JournalNode

->?創(chuàng)建命名空間(zkfc)?->?NameNode?->?DataNode?->?ResourceManager?->?NodeManager。

但請注意首次啟動NameNode之前,得先做format,也請注意備NameNode的啟動方法。

10.?啟動HDFS

在啟動HDFS之前,需要先完成對NameNode的格式化。

10.1.?創(chuàng)建好目錄

mkdir?-p?/home/tuzq/software/hadoop-2.8.0/tmp/dfs/name (可略去此步驟)

10.2.?啟動好zookeeper

./zkServer.sh?start

注意在啟動其它之前先啟動zookeeper

10.3.?創(chuàng)建命名空間

在其中一個namenodehadoop1)上執(zhí)行:

cd $HADOOP_HOME

bin/hdfs?zkfc?-formatZK ? ?(第二次不用執(zhí)行了)

?

10.4.?啟動所有JournalNode(hadoop1,hadoop2,hadoop3上執(zhí)行)

NameNode將元數(shù)據(jù)操作日志記錄在JournalNode上,主備NameNode通過記錄在JouralNode上的日志完成元數(shù)據(jù)同步。

?

在所有JournalNode上執(zhí)行:

cd $HADOOP_HOME

sbin/hadoop-daemon.sh?start?journalnode

?執(zhí)行完成之后執(zhí)行下面的命令進(jìn)行查看:

[root@hadoop2 hadoop-2.8.0]# jps
3314 Jps
3267 JournalNode
[root@hadoop2 hadoop-2.8.0]#

?

注意,在執(zhí)行“hdfs?namenode?-format”之前,必須先啟動好JournalNode,而format又必須在啟動namenode之前。

?

10.5初始化namenode

進(jìn)入hadoop1接著執(zhí)行下面的命令(初始化namenode,如果之前已經(jīng)初始化過了,此時不需要再次重新初始化namenode):

hdfs?namenode?-format ? ?(第二次不用執(zhí)行了)

?

10.6.初始化JournalNode

如果是非HA轉(zhuǎn)HA才需要這一步,在其中一個JournalNode(在hadoop1)上執(zhí)行:

bin/hdfs?namenode?-initializeSharedEdits ??(第二次不用執(zhí)行了):

此命令默認(rèn)是交互式的,加上參數(shù)-force轉(zhuǎn)成非交互式。

?

在所有JournalNode創(chuàng)建如下目錄(第二次不用執(zhí)行了):

mkdir?-p?/home/tuzq/software/hadoop-2.8.0/journal/mycluster/current

10.7.?啟動主NameNode

下面進(jìn)入的是hadoop1這臺機器。關(guān)于啟動hadoop2上的namenode在下面的博文中有介紹

1)?進(jìn)入$HADOOP_HOME目錄

2)?啟動主NameNode

sbin/hadoop-daemon.sh?start?namenode

?

啟動時,遇到如下所示的錯誤,則表示NameNode不能免密碼登錄自己。如果之前使用IP可以免密碼登錄自己,則原因一般是因為沒有使用主機名登錄過自己,因此解決辦法是使用主機名SSH一下

10.8.?啟動備NameNode

進(jìn)入hadoop2,執(zhí)行以下命令

bin/hdfs namenode -bootstrapStandby

出現(xiàn):Re-format的都選擇N

sbin/hadoop-daemon.sh?start?namenode

?

如果沒有執(zhí)行第1步,直接啟動會遇到如下錯誤:

No?valid?image?files?found

或者在該NameNode日志會發(fā)現(xiàn)如下錯誤:

2016-04-08?14:08:39,745?WARN?org.apache.hadoop.hdfs.server.namenode.FSNamesystem:?Encountered?exception?loading?fsimage

java.io.IOException:?NameNode?is?not?formatted.

10.9.?啟動主備切換進(jìn)程

在所有NameNode(即hadoop1和hadoop2上都執(zhí)行命令)上啟動主備切換進(jìn)程:

sbin/hadoop-daemon.sh?start?zkfc

只有啟動了DFSZKFailoverController進(jìn)程,HDFS才能自動切換主備。

?

注:zkfc是zookeeper?failover?controller的縮寫。

10.10.?啟動所有DataNode

在各個DataNode上分別執(zhí)行(即hadoop3,hadoop4,hadoop5上)

sbin/hadoop-daemon.sh?start?datanode

?

如果有發(fā)現(xiàn)DataNode進(jìn)程并沒有起來,可以試試刪除logs目錄下的DataNode日志,再得啟看看。

10.11.?檢查啟動是否成功

1)?使用JDK提供的jps命令,查看相應(yīng)的進(jìn)程是否已啟動

2)?檢查$HADOOP_HOME/logs目錄下的log和out文件,看看是否有異常信息。

?

啟動后nn1和nn2都處于備機狀態(tài),將nn1切換為主機(下面的命令在hadoop1上執(zhí)行):

bin/hdfs?haadmin?-transitionToActive?nn1

?

?

?

10.11.1.?DataNode

執(zhí)行jps命令(注:jps是jdk中的一個命令,不是jre中的命令),可看到DataNode進(jìn)程:

$?jps

18669?DataNode

24542?Jps

10.11.2.?NameNode

執(zhí)行jps命令,可看到NameNode進(jìn)程:

$?jps

18669?NameNode

24542?Jps

10.12.?執(zhí)行HDFS命令

執(zhí)行HDFS命令,以進(jìn)一步檢驗是否已經(jīng)安裝成功和配置好。關(guān)于HDFS命令的用法,直接運行命令hdfs或hdfs?dfs,即可看到相關(guān)的用法說明。

10.12.1.?查看DataNode是否正常啟動

hdfs?dfsadmin?-report

?

注意如果core-site.xml中的配置項fs.default.name的值為file:///,則會報:

report:?FileSystem?file:///?is?not?an?HDFS?file?system

Usage:?hdfs?dfsadmin?[-report]?[-live]?[-dead]?[-decommissioning]

?

解決這個問題,只需要將fs.default.name的值設(shè)置為和fs.defaultFS相同的值。

10.12.2啟動hdfs和yarn(在hadoop1,hadoop2上分別執(zhí)行)

進(jìn)入hadoop1機器,執(zhí)行命令:

[root@hadoop1sbin]# sbin/start-dfs.sh

?cd $HADOOP_HOME

# sbin/start-yarn.sh ? ? ?(注意:hadoop1和hadoop2都啟動)

???

在瀏覽器上訪問:http://hadoop1:50070/,界面如下:

?? 上面顯示的是主的,是active狀態(tài)。

??

??? 再在瀏覽器上訪問:http://hadoop2:50070/

?? 通過上面,發(fā)現(xiàn)hadoop2是一種備用狀態(tài)。

?

訪問yarn(訪問地址可以在yarn-site.xml中查找到),訪問之后的效果如下http://hadoop1:8088/cluster:

?

10.12.2.?查看NameNode的主備狀態(tài)

如查看NameNode1和NameNode2分別是主還是備:

$?hdfs?haadmin?-getServiceState?nn1

standby

$?hdfs?haadmin?-getServiceState?nn2

active

?

?

10.12.3.?hdfs?dfs?ls

注意:下面的命令只有在啟動了yarn之后才會可用

?

“hdfs?dfs?-ls”帶一個參數(shù),如果參數(shù)以“hdfs://URI”打頭表示訪問HDFS,否則相當(dāng)于ls。其中URI為NameNode的IP或主機名,可以包含端口號,即hdfs-site.xml中“dfs.namenode.rpc-address”指定的值。

“hdfs?dfs?-ls”要求默認(rèn)端口為8020,如果配置成9000,則需要指定端口號,否則不用指定端口,這一點類似于瀏覽器訪問一個URL。示例:

>?hdfs?dfs?-ls?hdfs://hadoop1:8020/

?

?

?

8020后面的斜杠/是和必須的,否則被當(dāng)作文件。如果不指定端口號8020,則使用默認(rèn)的8020,“hadoop1:8020”由hdfs-site.xml中“dfs.namenode.rpc-address”指定。

不難看出“hdfs?dfs?-ls”可以操作不同的HDFS集群,只需要指定不同的URI。

?

如果想通過hdfs協(xié)議查看文件列表或者文件,可以使用如下方式:

?

文件上傳后,被存儲在DataNode的data目錄下(由DataNode的hdfs-site.xml中的屬性“dfs.datanode.data.dir”指定),

如:$HADOOP_HOME/data/data/current/BP-472842913-192.168.106.91-1497065109036/current/finalized/subdir0/subdir0/blk_1073741825

文件名中的“blk”是block,即塊的意思,默認(rèn)情況下blk_1073741825即為文件的一個完整塊,Hadoop未對它進(jìn)額外處理。

10.12.4.?hdfs?dfs?-put

上傳文件命令,示例:

>?hdfs?dfs?-put?/etc/SuSE-release?hdfs://192.168.106.91/

10.12.5.?hdfs?dfs?-rm

刪除文件命令,示例:

>?hdfs?dfs?-rm?hdfs://192.168.106.91/SuSE-release

Deleted?hdfs://192.168.106.91/SuSE-release

10.12.6.?新NameNode如何加入?

當(dāng)有NameNode機器損壞時,必然存在新NameNode來替代。把配置修改成指向新NameNode,然后以備機形式啟動新NameNode,這樣新的NameNode即加入到Cluster中:

1)?bin/hdfs?namenode?-bootstrapStandby

2)?sbin/hadoop-daemon.sh?start?namenode

?

10.12.7.?HDFS只允許有一主一備兩個NameNode

如果試圖配置三個NameNode,如:

??dfs.ha.namenodes.test

??nm1,nm2,nm3

??

????The?prefix?for?a?given?nameservice,?contains?a?comma-separated

????list?of?namenodes?for?a?given?nameservice?(eg?EXAMPLENAMESERVICE).

??

?

則運行“hdfs?namenode?-bootstrapStandby”時會報如下錯誤,表示在同一NameSpace內(nèi)不能超過2個NameNode:

16/04/11?09:51:57?ERROR?namenode.NameNode:?Failed?to?start?namenode.

java.io.IOException:?java.lang.IllegalArgumentException:?Expected?exactly?2?NameNodes?in?namespace?'test'.?Instead,?got?only?3?(NN?ids?were?'nm1','nm2','nm3'

????????at?org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:425)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1454)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)

Caused?by:?java.lang.IllegalArgumentException:?Expected?exactly?2?NameNodes?in?namespace?'test'.?Instead,?got?only?3?(NN?ids?were?'nm1','nm2','nm3'

????????at?com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)

10.12.8.?存儲均衡start-balancer.sh

示例:start-balancer.sh?–t?10%

10%表示機器與機器之間磁盤使用率偏差小于10%時認(rèn)為均衡,否則做均衡搬動。“start-balancer.sh”調(diào)用“hdfs?start?balancer”來做均衡,可以調(diào)用stop-balancer.sh停止均衡。

?

均衡過程非常慢,但是均衡過程中,仍能夠正常訪問HDFS,包括往HDFS上傳文件。

[VM2016@hadoop-030?/data4/hadoop/sbin]$?hdfs?balancer?#?可以改為調(diào)用start-balancer.sh

16/04/08?14:26:55?INFO?balancer.Balancer:?namenodes??=?[hdfs://test]?//?test為HDFS的cluster名

16/04/08?14:26:55?INFO?balancer.Balancer:?parameters?=?Balancer.Parameters[BalancingPolicy.Node,?threshold=10.0,?max?idle?iteration?=?5,?number?of?nodes?to?be?excluded?=?0,?number?of?nodes?to?be?included?=?0]

Time?Stamp???????????????Iteration#??Bytes?Already?Moved??Bytes?Left?To?Move??Bytes?Being?Moved

16/04/08?14:26:56?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.231:50010

16/04/08?14:26:56?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.229:50010

16/04/08?14:26:56?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.213:50010

16/04/08?14:26:56?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.208:50010

16/04/08?14:26:56?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.232:50010

16/04/08?14:26:56?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.207:50010

16/04/08?14:26:56?INFO?balancer.Balancer:?5?over-utilized:?[192.168.1.231:50010:DISK,?192.168.1.229:50010:DISK,?192.168.1.213:50010:DISK,?192.168.1.208:50010:DISK,?192.168.1.232:50010:DISK]

16/04/08?14:26:56?INFO?balancer.Balancer:?1?underutilized未充分利用的):?[192.168.1.207:50010:DISK]?#?數(shù)據(jù)將移向該節(jié)點

16/04/08?14:26:56?INFO?balancer.Balancer:?Need?to?move?816.01?GB?to?make?the?cluster?balanced.?#?需要移動816.01G數(shù)據(jù)達(dá)到平衡

16/04/08?14:26:56?INFO?balancer.Balancer:?Decided?to?move?10?GB?bytes?from?192.168.1.231:50010:DISK?to?192.168.1.207:50010:DISK?#?192.168.1.231移動10G數(shù)據(jù)到192.168.1.207

16/04/08?14:26:56?INFO?balancer.Balancer:?Will?move?10?GB?in?this?iteration

?

16/04/08?14:32:58?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073749366_8542?with?size=77829046?from?192.168.1.231:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.213:50010

16/04/08?14:32:59?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073749386_8562?with?size=77829046?from?192.168.1.231:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.231:50010

16/04/08?14:33:34?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073749378_8554?with?size=77829046?from?192.168.1.231:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.231:50010

16/04/08?14:34:38?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073749371_8547?with?size=134217728?from?192.168.1.231:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.213:50010

16/04/08?14:34:54?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073749395_8571?with?size=134217728?from?192.168.1.231:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.231:50010

Apr?8,?2016?2:35:01?PM????????????0????????????478.67?MB???????????816.01?GB??????????????10?GB

16/04/08?14:35:10?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.213:50010

16/04/08?14:35:10?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.229:50010

16/04/08?14:35:10?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.232:50010

16/04/08?14:35:10?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.231:50010

16/04/08?14:35:10?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.208:50010

16/04/08?14:35:10?INFO?net.NetworkTopology:?Adding?a?new?node:?/default-rack/192.168.1.207:50010

16/04/08?14:35:10?INFO?balancer.Balancer:?5?over-utilized:?[192.168.1.213:50010:DISK,?192.168.1.229:50010:DISK,?192.168.1.232:50010:DISK,?192.168.1.231:50010:DISK,?192.168.1.208:50010:DISK]

16/04/08?14:35:10?INFO?balancer.Balancer:?1?underutilized未充分利用的):?[192.168.1.207:50010:DISK]

16/04/08?14:35:10?INFO?balancer.Balancer:?Need?to?move?815.45?GB?to?make?the?cluster?balanced.

16/04/08?14:35:10?INFO?balancer.Balancer:?Decided?to?move?10?GB?bytes?from?192.168.1.213:50010:DISK?to?192.168.1.207:50010:DISK

16/04/08?14:35:10?INFO?balancer.Balancer:?Will?move?10?GB?in?this?iteration

?

16/04/08?14:41:18?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073760371_19547?with?size=77829046?from?192.168.1.213:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.213:50010

16/04/08?14:41:19?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073760385_19561?with?size=77829046?from?192.168.1.213:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.213:50010

16/04/08?14:41:22?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073760393_19569?with?size=77829046?from?192.168.1.213:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.213:50010

16/04/08?14:41:23?INFO?balancer.Dispatcher:?Successfully?moved?blk_1073760363_19539?with?size=77829046?from?192.168.1.213:50010:DISK?to?192.168.1.207:50010:DISK?through?192.168.1.213:50010

10.12.9.?新增JournalNode

找一臺已有JournalNode節(jié)點,修改它的hdfs-site.xml,將新增的Journal包含進(jìn)來,如在

qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster

?

的基礎(chǔ)上新增hadoop6和hadoop7兩個JournalNode:

qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485;hadoop6:8485;hadoop7:8485/mycluster

?

然后將安裝目錄和數(shù)據(jù)目錄(hdfs-site.xml中的dfs.journalnode.edits.dir指定的目錄)都復(fù)制到新的節(jié)點。

如果不復(fù)制JournalNode的數(shù)據(jù)目錄,則新節(jié)點上的JournalNode會報錯“Journal?Storage?Directory?/data/journal/test?not?formatted”,將來的版本可能會實現(xiàn)自動同步。

接下來,就可以在新節(jié)點上啟動好JournalNode(不需要做什么初始化),并重啟下NameNode。注意觀察JournalNode日志,查看是否啟動成功,當(dāng)日志顯示為以下這樣的INFO級別日志則表示啟動成功:

2016-04-26?10:31:11,160?INFO?org.apache.hadoop.hdfs.server.namenode.FileJournalManager:?Finalizing?edits?file?/data/journal/test/current/edits_inprogress_0000000000000194269?->?/data/journal/test/current/edits_0000000000000194269-0000000000000194270

11.?啟動YARN

11.1.?啟動YARN

1)?進(jìn)入$HADOOP_HOME/sbin目錄

2)?在主備兩臺都執(zhí)行:start-yarn.sh,即開始啟動YARN

?

若啟動成功,則在Master節(jié)點執(zhí)行jps,可以看到ResourceManager:

>?jps

24689?NameNode

30156?Jps

28861?ResourceManager

?

在Slaves節(jié)點執(zhí)行jps,可以看到NodeManager:

$?jps

14019?NodeManager

23257?DataNode

15115?Jps

?

如果只需要單獨啟動指定節(jié)點上的ResourceManager,這樣:

sbin/yarn-daemon.sh?start?resourcemanager

?

對于NodeManager,則是這樣:

sbin/yarn-daemon.sh?start?nodemanager

11.2.?執(zhí)行YARN命令

11.2.1.?yarn?node?-list

列舉YARN集群中的所有NodeManager,如(注意參數(shù)間的空格,直接執(zhí)行yarn可以看到使用幫助):

[root@hadoop1sbin]# yarn node –list

?

11.2.2.?yarn?node?-status

查看指定NodeManager的狀態(tài)(通過上面查出來的結(jié)果進(jìn)行查詢),如:

[root@hadoop1 hadoop]# yarn node -status hadoop5:59894

Node Report :

???? Node-Id : hadoop5:59894

???? Rack : /default-rack

???? Node-State : RUNNING

???? Node-Http-Address : hadoop5:8042

???? Last-Health-Update : 星期六 10/六月/17 12:30:38:20CST

???? Health-Report :

???? Containers : 0

???? Memory-Used : 0MB

???? Memory-Capacity : 8192MB

???? CPU-Used : 0 vcores

???? CPU-Capacity : 8 vcores

???? Node-Labels :

???? Resource Utilization by Node : PMem:733 MB, VMem:733 MB, VCores:0.0

???? Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0

?

[root@hadoop1 hadoop]#?

11.2.3.?yarn?rmadmin?-getServiceState?rm1

查看rm1的主備狀態(tài),即查看它是主(active)還是備(standby)。

?

11.2.4.?yarn?rmadmin?-transitionToStandby?rm1

將rm1從主切為備。

更多的yarn命令可以參考:

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnCommands.html

12.?運行MapReduce程序

在安裝目錄的share/hadoop/mapreduce子目錄下,有現(xiàn)存的示例程序:

hadoop@VM-40-171-sles10-64:~/hadoop>?ls?share/hadoop/mapreduce

hadoop-mapreduce-client-app-2.7.2.jar?????????hadoop-mapreduce-client-jobclient-2.7.2-tests.jar

hadoop-mapreduce-client-common-2.7.2.jar??????hadoop-mapreduce-client-shuffle-2.7.2.jar

hadoop-mapreduce-client-core-2.7.2.jar????????hadoop-mapreduce-examples-2.7.2.jar

hadoop-mapreduce-client-hs-2.7.2.jar??????????lib

hadoop-mapreduce-client-hs-plugins-2.7.2.jar??lib-examples

hadoop-mapreduce-client-jobclient-2.7.2.jar???sources

?

跑一個示例程序試試:

hdfs?dfs?-put?/etc/hosts??hdfs://test/in/

hadoop?jar?./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar?wordcount?hdfs://test/in/?hdfs://test/out/

?

運行過程中,使用java的jps命令,可以看到y(tǒng)arn啟動了名為YarnChild的進(jìn)程。

wordcount運行完成后,結(jié)果會保存在out目錄下,保存結(jié)果的文件名類似于“part-r-00000”。另外,跑這個示例程序有兩個需求注意的點:

1)?in目錄下要有文本文件,或in即為被統(tǒng)計的文本文件,可以為HDFS上的文件或目錄,也可以為本地文件或目錄

2)?out目錄不能存在,程序會自動去創(chuàng)建它,如果已經(jīng)存在則會報錯。

?

包hadoop-mapreduce-examples-2.7.2.jar中含有多個示例程序,不帶參數(shù)運行,即可看到用法:

>?hadoop?jar?./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar?wordcount

Usage:?wordcount??

?

>?hadoop?jar?./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar

An?example?program?must?be?given?as?the?first?argument.

Valid?program?names?are:

??aggregatewordcount:?An?Aggregate?based?map/reduce?program?that?counts?the?words?in?the?input?files.

??aggregatewordhist:?An?Aggregate?based?map/reduce?program?that?computes?the?histogram?of?the?words?in?the?input?files.

??bbp:?A?map/reduce?program?that?uses?Bailey-Borwein-Plouffe?to?compute?exact?digits?of?Pi.

??dbcount:?An?example?job?that?count?the?pageview?counts?from?a?database.

??distbbp:?A?map/reduce?program?that?uses?a?BBP-type?formula?to?compute?exact?bits?of?Pi.

??grep:?A?map/reduce?program?that?counts?the?matches?of?a?regex?in?the?input.

??join:?A?job?that?effects?a?join?over?sorted,?equally?partitioned?datasets

??multifilewc:?A?job?that?counts?words?from?several?files.

??pentomino:?A?map/reduce?tile?laying?program?to?find?solutions?to?pentomino?problems.

??pi:?A?map/reduce?program?that?estimates?Pi?using?a?quasi-Monte?Carlo?method.

??randomtextwriter:?A?map/reduce?program?that?writes?10GB?of?random?textual?data?per?node.

??randomwriter:?A?map/reduce?program?that?writes?10GB?of?random?data?per?node.

??secondarysort:?An?example?defining?a?secondary?sort?to?the?reduce.

??sort:?A?map/reduce?program?that?sorts?the?data?written?by?the?random?writer.

??sudoku:?A?sudoku?solver.

??teragen:?Generate?data?for?the?terasort

??terasort:?Run?the?terasort

??teravalidate:?Checking?results?of?terasort

??wordcount:?A?map/reduce?program?that?counts?the?words?in?the?input?files.

??wordmean:?A?map/reduce?program?that?counts?the?average?length?of?the?words?in?the?input?files.

??wordmedian:?A?map/reduce?program?that?counts?the?median?length?of?the?words?in?the?input?files.

??wordstandarddeviation:?A?map/reduce?program?that?counts?the?standard?deviation?of?the?length?of?the?words?in?the?input?files.

?

修改日志級別為DEBBUG,并打屏:

export?HADOOP_ROOT_LOGGER=DEBUG,console

?

?

?

?

13.?HDFS權(quán)限配置

13.1.?hdfs-site.xml

dfs.permissions.enabled?=?true

dfs.permissions.superusergroup?=?supergroup

dfs.cluster.administrators?=?ACL-for-admins

dfs.namenode.acls.enabled?=?true

dfs.web.ugi?=?webuser,webgroup

13.2.?core-site.xml

fs.permissions.umask-mode?=?022

hadoop.security.authentication?=?simple?安全驗證規(guī)則,可為simple或kerberos

14.?C++客戶端編程

14.1.?示例代碼

//?g++?-g?-o?x?x.cpp?-L$JAVA_HOME/lib/amd64/jli?-ljli?-L$JAVA_HOME/jre/lib/amd64/server?-ljvm?-I$HADOOP_HOME/include?$HADOOP_HOME/lib/native/libhdfs.a?-lpthread?-ldl

#include?"hdfs.h"

#include?

#include?

#include?

?

int?main(int?argc,?char?**argv)

{

#if?0

????hdfsFS?fs?=?hdfsConnect("default",?0);?//?HA方式

????const?char*?writePath?=?"hdfs://mycluster/tmp/testfile.txt";

????hdfsFile?writeFile?=?hdfsOpenFile(fs,?writePath,?O_WRONLY?|O_CREAT,?0,?0,?0);

????if(!writeFile)

????{

??????????fprintf(stderr,?"Failed?to?open?%s?for?writing!\n",?writePath);

??????????exit(-1);

????}

????const?char*?buffer?=?"Hello,?World!\n";

????tSize?num_written_bytes?=?hdfsWrite(fs,?writeFile,?(void*)buffer,?strlen(buffer)+1);

????if?(hdfsFlush(fs,?writeFile))

????{

???????????fprintf(stderr,?"Failed?to?'flush'?%s\n",?writePath);

??????????exit(-1);

????}

????hdfsCloseFile(fs,?writeFile);

#else

????struct?hdfsBuilder*?bld?=?hdfsNewBuilder();

????hdfsBuilderSetNameNode(bld,?"default");?//?HA方式

????hdfsFS?fs?=?hdfsBuilderConnect(bld);

????if?(NULL?==?fs)

????{

??????????fprintf(stderr,?"Failed?to?connect?hdfs\n");

??????????exit(-1);

????}

????int?num_entries?=?0;

????hdfsFileInfo*?entries;

????if?(argc?<?2)

????????entries?=?hdfsListDirectory(fs,?"/",?&num_entries);

????else

????????entries?=?hdfsListDirectory(fs,?argv[1],?&num_entries);

????fprintf(stdout,?"num_entries:?%d\n",?num_entries);

????for?(int?i=0;?i<num_entries;?++i)?</num_entries;?++i)<>

????{

????????fprintf(stdout,?"%s\n",?entries[i].mName);

????}?

????hdfsFreeFileInfo(entries,?num_entries);

????hdfsDisconnect(fs);

????//hdfsFreeBuilder(bld);?

#endif

????return?0;

}

14.2.?運行示例

運行之前需要設(shè)置好CLASSPATH,如果設(shè)置不當(dāng),可能會遇到不少困難,比如期望操作HDFS上的文件和目錄,卻變成了本地的文件和目錄,如者諸于“java.net.UnknownHostException”類的錯誤等。

為避免出現(xiàn)錯誤,強烈建議使用命令“hadoop?classpath?--glob”取得正確的CLASSPATH值。

另外還需要設(shè)置好libjli.so和libjvm.so兩個庫的LD_LIBRARY_PATH,如:

export?LD_LIBRARY_PATH=$JAVA_HOME/lib/amd64/jli:$JAVA_HOME/jre/lib/amd64/server:$LD_LIBRARY_PATH

15.?常見錯誤

15.1.?執(zhí)行“hdfs?dfs?-ls”時報ConnectException

原因可能是指定的端口號9000不對,該端口號由hdfs-site.xml中的屬性“dfs.namenode.rpc-address”指定,即為NameNode的RPC服務(wù)端口號。

?

文件上傳后,被存儲在DataNode的data(由DataNode的hdfs-site.xml中的屬性“dfs.datanode.data.dir”指定)目錄下,如:

$HADOOP_HOME/data/current/BP-139798373-192.168.106.91-1397735615751/current/finalized/blk_1073741825

文件名中的“blk”是block,即塊的意思,默認(rèn)情況下blk_1073741825即為文件的一個完整塊,Hadoop未對它進(jìn)額外處理。

hdfs?dfs?-ls?hdfs://192.168.106.91:9000

14/04/17?12:04:02?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.attempts;??Ignoring.

14/04/17?12:04:02?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.retry.interval;??Ignoring.

14/04/17?12:04:02?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.attempts;??Ignoring.

14/04/17?12:04:02?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.retry.interval;??Ignoring.

14/04/17?12:04:02?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.attempts;??Ignoring.

14/04/17?12:04:02?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.retry.interval;??Ignoring.

Java?HotSpot(TM)?64-Bit?Server?VM?warning:?You?have?loaded?library?/home/tuzq/software/hadoop-2.8.0/lib/native/libhadoop.so.1.0.0?which?might?have?disabled?stack?guard.?The?VM?will?try?to?fix?the?stack?guard?now.

It's?highly?recommended?that?you?fix?the?library?with?'execstack?-c?',?or?link?it?with?'-z?noexecstack'.

14/04/17?12:04:02?WARN?util.NativeCodeLoader:?Unable?to?load?native-hadoop?library?for?your?platform...?using?builtin-java?classes?where?applicable

14/04/17?12:04:03?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.attempts;??Ignoring.

14/04/17?12:04:03?WARN?conf.Configuration:?mapred-site.xml:an?attempt?to?override?final?parameter:?mapreduce.job.end-notification.max.retry.interval;??Ignoring.

ls:?Call?From?VM-40-171-sles10-64/192.168.106.91?to?VM-40-171-sles10-64:9000?failed?on?connection?exception:?java.net.ConnectException:?拒絕連接;?For?more?details?see:??http://wiki.apache.org/hadoop/ConnectionRefused

15.2.?Initialization?failed?for?Block?pool

可能是因為對NameNode做format之前,沒有清空DataNode的data目錄。

15.3.?Incompatible?clusterIDs

?

“Incompatible?clusterIDs”的錯誤原因是在執(zhí)行“hdfs?namenode?-format”之前,沒有清空DataNode節(jié)點的data目錄。

?

網(wǎng)上一些文章和帖子說是tmp目錄,它本身也是沒問題的,但Hadoop?2.7.2是data目錄,實際上這個信息已經(jīng)由日志的“/home/tuzq/software/hadoop-2.8.0/data”指出,所以不能死死的參照網(wǎng)上的解決辦法,遇到問題時多仔細(xì)觀察。

?

從上述描述不難看出,解決辦法就是清空所有DataNode的data目錄,但注意不要將data目錄本身給刪除了。

data目錄由core-site.xml文件中的屬性“dfs.datanode.data.dir”指定。

?

2014-04-17?19:30:33,075?INFO?org.apache.hadoop.hdfs.server.common.Storage:?Lock?on?/home/tuzq/software/hadoop-2.8.0/data/in_use.lock?acquired?by?nodename?28326@localhost

2014-04-17?19:30:33,078?FATAL?org.apache.hadoop.hdfs.server.datanode.DataNode:?Initialization?failed?for?block?pool?Block?pool??(Datanode?Uuid?unassigned)?service?to?/192.168.106.91:9001

java.io.IOException:?Incompatible?clusterIDs?in?/home/tuzq/software/hadoop-2.8.0/data:?namenode?clusterID?=?CID-50401d89-a33e-47bf-9d14-914d8f1c4862;?datanode?clusterID?=?CID-153d6fcb-d037-4156-b63a-10d6be224091

????????at?org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472)

????????at?org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)

????????at?org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)

????????at?org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929)

????????at?org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900)

????????at?org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)

????????at?org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)

????????at?org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)

????????at?java.lang.Thread.run(Thread.java:744)

2014-04-17?19:30:33,081?WARN?org.apache.hadoop.hdfs.server.datanode.DataNode:?Ending?block?pool?service?for:?Block?pool??(Datanode?Uuid?unassigned)?service?to?/192.168.106.91:9001

2014-04-17?19:30:33,184?WARN?org.apache.hadoop.hdfs.server.datanode.DataNode:?Block?pool?ID?needed,?but?service?not?yet?registered?with?NN

java.lang.Exception:?trace

????????at?org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)

????????at?org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91)

????????at?org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859)

????????at?org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)

????????at?org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)

????????at?org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)

????????at?java.lang.Thread.run(Thread.java:744)

2014-04-17?19:30:33,184?INFO?org.apache.hadoop.hdfs.server.datanode.DataNode:?Removed?Block?pool??(Datanode?Uuid?unassigned)

2014-04-17?19:30:33,184?WARN?org.apache.hadoop.hdfs.server.datanode.DataNode:?Block?pool?ID?needed,?but?service?not?yet?registered?with?NN

java.lang.Exception:?trace

????????at?org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)

????????at?org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:861)

????????at?org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)

????????at?org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619)

????????at?org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837)

????????at?java.lang.Thread.run(Thread.java:744)

2014-04-17?19:30:35,185?WARN?org.apache.hadoop.hdfs.server.datanode.DataNode:?Exiting?Datanode

2014-04-17?19:30:35,187?INFO?org.apache.hadoop.util.ExitUtil:?Exiting?with?status?0

2014-04-17?19:30:35,189?INFO?org.apache.hadoop.hdfs.server.datanode.DataNode:?SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:?Shutting?down?DataNode?at?localhost/127.0.0.1

************************************************************/

15.4.?Inconsistent?checkpoint?fields

SecondaryNameNode中的“Inconsistent?checkpoint?fields”錯誤原因,可能是因為沒有設(shè)置好SecondaryNameNode上core-site.xml文件中的“hadoop.tmp.dir”。

?

2014-04-17?11:42:18,189?INFO?org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:?Log?Size?Trigger????:1000000?txns

2014-04-17?11:43:18,365?ERROR?org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:?Exception?in?doCheckpoint

java.io.IOException:?Inconsistent?checkpoint?fields.

LV?=?-56?namespaceID?=?1384221685?cTime?=?0?;?clusterId?=?CID-319b9698-c88d-4fe2-8cb2-c4f440f690d4?;?blockpoolId?=?BP-1627258458-192.168.106.91-1397735061985.

Expecting?respectively:?-56;?476845826;?0;?CID-50401d89-a33e-47bf-9d14-914d8f1c4862;?BP-2131387753-192.168.106.91-1397730036484.

????????at?org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:135)

????????at?org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:518)

????????at?org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:383)

????????at?org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:349)

????????at?org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)

????????at?org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:345)

????????at?java.lang.Thread.run(Thread.java:744)

?

另外,也請配置好SecondaryNameNodehdfs-site.xml中的“dfs.datanode.data.dir”為合適的值:

??hadoop.tmp.dir

??/home/tuzq/software/current/tmp

??A?base?for?other?temporary?directories.

15.5.?fs.defaultFS?is?file:///

在core-site.xml中,當(dāng)只填寫了fs.defaultFS,而fs.default.name為默認(rèn)的file:///時,會報此錯誤。解決方法是設(shè)置成相同的值。

15.6.?a?shared?edits?dir?must?not?be?specified?if?HA?is?not?enabled

該錯誤可能是因為hdfs-site.xml中沒有配置dfs.nameservices或dfs.ha.namenodes.mycluster。

15.7.?/tmp/dfs/name?is?in?an?inconsistent?state:?storage?directory?does?not?exist?or?is?not?accessible.

只需按日志中提示的,創(chuàng)建好相應(yīng)的目錄。

15.8.?The?auxService:mapreduce_shuffle?does?not?exist

問題原因是沒有配置yarn-site.xml中的“yarn.nodemanager.aux-services”,將它的值配置為mapreduce_shuffle,然后重啟yarn問題即解決。記住所有yarn節(jié)點都需要修改,包括ResourceManager和NodeManager,如果NodeManager上的沒有修改,仍然會報這個錯誤。

15.9.?org.apache.hadoop.ipc.Client:?Retrying?connect?to?server

該問題,有可能是因為NodeManager中的yarn-site.xml和ResourceManager上的不一致,比如NodeManager沒有配置yarn.resourcemanager.ha.rm-ids。

15.10.?mapreduce.Job:?Running?job:?job_1445931397013_0001

Hadoop提交mapreduce任務(wù)時,卡在mapreduce.Job:?Running?job:?job_1445931397013_0001處。

問題原因可能是因為yarn的NodeManager沒起來,可以用jdk的jps確認(rèn)下。

?

該問題也有可能是因為NodeManager中的yarn-site.xml和ResourceManager上的不一致,比如NodeManager沒有配置yarn.resourcemanager.ha.rm-ids。

15.11.?Could?not?format?one?or?more?JournalNodes

執(zhí)行“./hdfs?namenode?-format”時報“Could?not?format?one?or?more?JournalNodes”。

可能是hdfs-site.xml中的dfs.namenode.shared.edits.dir配置錯誤,比如重復(fù)了,如:

?qjournal://hadoop-168-254:8485;hadoop-168-254:8485;hadoop-168-253:8485;hadoop-168-252:8485;hadoop-168-251:8485/mycluster

?

修復(fù)后,重啟JournalNode,問題可能就解決了。

15.12.?org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:?Already?in?standby?state

遇到這個錯誤,可能是yarn-site.xml中的yarn.resourcemanager.webapp.address配置錯誤,比如配置成了兩個yarn.resourcemanager.webapp.address.rm1,實際應(yīng)當(dāng)是yarn.resourcemanager.webapp.address.rm1和yarn.resourcemanager.webapp.address.rm2。

15.13.?No?valid?image?files?found

如果是備NameNode,執(zhí)行下“hdfs?namenode?-bootstrapStandby”再啟動。

2015-12-01?15:24:39,535?ERROR?org.apache.hadoop.hdfs.server.namenode.NameNode:?Failed?to?start?namenode.

java.io.FileNotFoundException:?No?valid?image?files?found

????????at?org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.getLatestImages(FSImageTransactionalStorageInspector.java:165)

????????at?org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:623)

????????at?org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)

????????at?org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)

????????at?org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)

????????at?org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)

2015-12-01?15:24:39,536?INFO?org.apache.hadoop.util.ExitUtil:?Exiting?with?status?1

2015-12-01?15:24:39,539?INFO?org.apache.hadoop.hdfs.server.namenode.NameNode:?SHUTDOWN_MSG:

15.14.?xceivercount?4097?exceeds?the?limit?of?concurrent?xcievers?4096

此錯誤的原因是hdfs-site.xml中的配置項“dfs.datanode.max.xcievers”值4096過小,需要改大一點。該錯誤會導(dǎo)致hbase報“notservingregionexception”。

16/04/06?14:30:34?ERROR?namenode.NameNode:?Failed?to?start?namenode.

15.15.?java.lang.IllegalArgumentException:?Unable?to?construct?journal,?qjournal://hadoop-030:8485;hadoop-031:8454;hadoop-032

執(zhí)行“hdfs?namenode?-format”遇到上述錯誤時,是因為hdfs-site.xml中的配置dfs.namenode.shared.edits.dir配置錯誤,其中的hadoop-032省了“:8454”部分。

15.16.?Bad?URI?'qjournal://hadoop-030:8485;hadoop-031:8454;hadoop-032:8454':?must?identify?journal?in?path?component

是因為配置hdfs-site.xml中的“dfs.namenode.shared.edits.dir”時,路徑少帶了cluster名。

15.17.?16/04/06?14:48:19?INFO?ipc.Client:?Retrying?connect?to?server:?hadoop-032/10.143.136.211:8454.?Already?tried?0?time(s);?retry?policy?is?RetryUpToMaximumCountWithFixedSleep(maxRetries=10,?sleepTime=1000?MILLISECONDS)

檢查hdfs-site.xml中的“dfs.namenode.shared.edits.dir”值,JournalNode默認(rèn)端口是8485,不是8454,確認(rèn)是否有寫錯。JournalNode端口由hdfs-site.xml中的配置項dfs.journalnode.rpc-address決定。

15.18.?Exception?in?thread?"main"?org.apache.hadoop.HadoopIllegalArgumentException:?Could?not?get?the?namenode?ID?of?this?node.?You?may?run?zkfc?on?the?node?other?than?namenode.

執(zhí)行“hdfs?zkfc?-formatZK”遇到上面這個錯誤,是因為還沒有執(zhí)行“hdfs?namenode?-format”。NameNode?ID是在“hdfs?namenode?-format”時生成的。

15.19.?2016-04-06?17:08:07,690?INFO?org.apache.hadoop.hdfs.server.common.Storage:?Storage?directory?[DISK]file:/data3/datanode/data/?has?already?been?used.

以非root用戶啟動DataNode,但啟動不了,在它的日志文件中發(fā)現(xiàn)如下錯誤信息:

2016-04-06?17:08:07,707?INFO?org.apache.hadoop.hdfs.server.common.Storage:?Analyzing?storage?directories?for?bpid?BP-418073539-10.143.136.207-1459927327462

2016-04-06?17:08:07,707?WARN?org.apache.hadoop.hdfs.server.common.Storage:?Failed?to?analyze?storage?directories?for?block?pool?BP-418073539-10.143.136.207-1459927327462

java.io.IOException:?BlockPoolSliceStorage.recoverTransitionRead:?attempt?to?load?an?used?block?storage:?/data3/datanode/data/current/BP-418073539-10.143.136.207-1459927327462

繼續(xù)尋找,會發(fā)現(xiàn)還存在如何錯誤提示:

Invalid?dfs.datanode.data.dir?/data3/datanode/data:

EPERM:?Operation?not?permitted

使用命令“l(fā)s?-l”檢查目錄/data3/datanode/data的權(quán)限設(shè)置,發(fā)現(xiàn)owner為root,原因是因為之前使用root啟動過DataNode,將owner改過來即可解決此問題。

15.20.?2016-04-06?18:00:26,939?WARN?org.apache.hadoop.hdfs.server.datanode.DataNode:?Problem?connecting?to?server:?hadoop-031/10.143.136.208:8020

DataNode的日志文件不停地記錄如下日志,是因為DataNode將作為主NameNode,但實際上10.143.136.208并沒有啟動,主NameNode不是它。這個并不表示DataNode沒有起來,而是因為DataNode會同時和主NameNode和備NameNode建立心跳,當(dāng)備NameNode沒有起來時,有這些日志是正常現(xiàn)象。

2016-04-06?18:00:32,940?INFO?org.apache.hadoop.ipc.Client:?Retrying?connect?to?server:?hadoop-031/10.143.136.208:8020.?Already?tried?0?time(s);?retry?policy?is?RetryUpToMaximumCountWithFixedSleep(maxRetries=10,?sleepTime=1000?MILLISECONDS)

2016-04-06?17:55:44,555?INFO?org.apache.hadoop.hdfs.server.datanode.DataNode:?Namenode?Block?pool?BP-418073539-10.143.136.207-1459927327462?(Datanode?Uuid?2d115d45-fd48-4e86-97b1-e74a1f87e1ca)?service?to?hadoop-030/10.143.136.207:8020?trying?to?claim?ACTIVE?state?with?txid=1

“trying?to?claim?ACTIVE?state”出自于hadoop/hdfs/server/datanode/BPOfferService.java中的updateActorStatesFromHeartbeat()。

?

2016-04-06?17:55:49,893?INFO?org.apache.hadoop.ipc.Client:?Retrying?connect?to?server:?hadoop-031/10.143.136.208:8020.?Already?tried?5?time(s);?retry?policy?is?RetryUpToMaximumCountWithFixedSleep(maxRetries=10,?sleepTime=1000?MILLISECONDS)

“Retrying?connect?to?server”出自于hadoop/ipc/Client.java中的handleConnectionTimeout()和handleConnectionFailure()。

15.21.?ERROR?cluster.YarnClientSchedulerBackend:?Yarn?application?has?already?exited?with?state?FINISHED!

如果遇到這個錯誤,請檢查NodeManager日志,如果發(fā)現(xiàn)有如下所示信息:

WARN?org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:?Container?[pid=26665,containerID=container_1461657380500_0020_02_000001]?is?running?beyond?virtual?memory?limits.?Current?usage:?345.0?MB?of?1?GB?physical?memory?used;?2.2?GB?of?2.1?GB?virtual?memory?used.?Killing?container.

?

則表示需要增大yarn-site.xmk的配置項yarn.nodemanager.vmem-pmem-ratio的值,該配置項默認(rèn)值為2.1。

16/10/13?10:23:19?ERROR?client.TransportClient:?Failed?to?send?RPC?7614640087981520382?to?/10.143.136.231:34800:?java.nio.channels.ClosedChannelException

java.nio.channels.ClosedChannelException

16/10/13?10:23:19?ERROR?cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:?Sending?RequestExecutors(0,0,Map())?to?AM?was?unsuccessful

java.io.IOException:?Failed?to?send?RPC?7614640087981520382?to?/10.143.136.231:34800:?java.nio.channels.ClosedChannelException

????????at?org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)

????????at?org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)

????????at?io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)

????????at?io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)

????????at?io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)

????????at?io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)

????????at?io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)

????????at?io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)

????????at?java.lang.Thread.run(Thread.java:745)

16.?相關(guān)文檔

《HBase-0.98.0分布式安裝指南》

《Hive?0.12.0安裝指南》

《ZooKeeper-3.4.6分布式安裝指南》

《Hadoop?2.3.0源碼反向工程》

《在Linux上編譯Hadoop-2.7.2》

《Accumulo-1.5.1安裝指南》

《Drill?1.0.0安裝指南》

《Shark?0.9.1安裝指南》

?

?

?

與50位技術(shù)專家面對面20年技術(shù)見證,附贈技術(shù)全景圖

總結(jié)

以上是生活随笔為你收集整理的hadoop-HA集群搭建,启动DataNode,检测启动状态,执行HDFS命令,启动YARN,HDFS权限配置,C++客户端编程,常见错误的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。