hadoop2.4.1集群搭建
準備Linux環境
修改主機名:
$ vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop001
?
修改IP:
# vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=?????????????
TYPE=Ethernet
UUID=????????????????
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=172.17.30.111
NETMASK=255.255.254.0
GATEWAY=172.17.30.1
DNS1=223.5.5.5
DNS2=223.6.6.6
?
關閉防火墻:
查看防火墻狀態
???????? service iptables status
???????? 關閉防火墻
???????? service iptables stop
???????? 查看防火墻開機啟動狀態
???????? chkconfig iptables --list
???????? 關閉防火墻開機啟動
???????? chkconfig iptables off
?
修改主機名和IP映射關系:
$ vim /etc/hosts
172.17.30.111?? hadoop001
172.17.30.112?? hadoop002
172.17.30.113?? hadoop003
172.17.30.114?? hadoop004
172.17.30.115?? hadoop005
172.17.30.116?? hadoop006
172.17.30.117?? hadoop007
?
重啟機器:
# reboot
?
安裝JDK
解壓jdk:
# tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/modules/
?
添加環境變量:
# vim /etc/profile
##JAVA
JAVA_HOME=/opt/modules/jdk1.7.0_79
JRE_HOME=/opt/modules/jdk1.7.0_79/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
?
刷新配置:
# source /etc/profile
?
安裝hadoop2.4.1
解壓hadoop2.4.1:
# tar -zxvf hadoop-2.4.1.tar.gz -C /opt/modules/
?
添加環境變量:
# vim /etc/profile
##HADOOP
export HADOOP_HOME=/opt/modules/hadoop-2.4.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
?
刷新配置:
# source /etc/profile
?
集群規劃:
???????? 主機名 ? ? ? ? ? ? ? ?IP ? ? ? ? ? ? ? ? ? ? ? ? ? ?安裝的軟件 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 運行的進程
???????? hadoop001?????? 172.17.30.111 ? ? ? ? jdk、hadoop ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?NameNode、DFSZKFailoverController(zkfc)
???????? hadoop002?????? 172.17.30.112 ? ? ? ? jdk、hadoop ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?NameNode、DFSZKFailoverController(zkfc)
???????? hadoop003?????? 172.17.30.113 ? ? ? ? jdk、hadoop ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ResourceManager
???????? hadoop004?????? 172.17.30.114 ? ? ? ? jdk、hadoop ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ResourceManager
???????? hadoop005?????? 172.17.30.115 ? ? ? ? jdk、hadoop、zookeeper ? ? ? ? ?DataNode、NodeManager、JournalNode、QuorumPeerMain
???????? hadoop006?????? 172.17.30.116 ? ? ? ? jdk、hadoop、zookeeper ? ? ? ? ?DataNode、NodeManager、JournalNode、QuorumPeerMain
???????? hadoop007?????? 172.17.30.117 ? ? ? ? jdk、hadoop、zookeeper ? ? ? ? ?DataNode、NodeManager、JournalNode、QuorumPeerMain
????????
說明:
???????? 1.在hadoop2.0中通常由兩個NameNode組成,一個處于active狀態,另一個處于standby狀態。Active NameNode對外提供服務,而Standby NameNode則不對外提供服務,僅同步active namenode的狀態,以便能夠在它失敗時快速進行切換。
???????? hadoop2.0官方提供了兩種HDFS HA的解決方案,一種是NFS,另一種是QJM。這里我們使用簡單的QJM。在該方案中,主備NameNode之間通過一組JournalNode同步元數據信息,一條數據只要成功寫入多數JournalNode即認為寫入成功。通常配置奇數個JournalNode
???????? 這里還配置了一個zookeeper集群,用于ZKFC(DFSZKFailoverController)故障轉移,當Active NameNode掛掉了,會自動切換Standby NameNode為standby狀態
???????? 2.hadoop-2.2.0中依然存在一個問題,就是ResourceManager只有一個,存在單點故障,hadoop-2.4.1解決了這個問題,有兩個ResourceManager,一個是Active,一個是Standby,狀態由zookeeper進行協調
?
?
配置HDFS:
修改hadoop-env.sh:
# vim hadoop-env.sh
export JAVA_HOME=/opt/modules/jdk1.7.0_79
?
修改core-site.xml:
# vim core-site.xml
| <configuration> ???????? <!-- 指定hdfs的nameservice為ns1 --> ???????? <property> ?????????????????? <name>fs.defaultFS</name> ?????????????????? <value>hdfs://ns1</value> ???????? </property> ???????? <!-- 指定hadoop臨時目錄 --> ???????? <property> ?????????????????? <name>hadoop.tmp.dir</name> ?????????????????? <value>/opt/data/tmp</value> ???????? </property> ???????? <!-- 指定zookeeper地址 --> ???????? <property> ?????????????????? <name>ha.zookeeper.quorum</name> ?????????????????? <value>hadoop005:2181,hadoop006:2181,hadoop007:2181</value> ???????? </property> </configuration> |
修改hdfs-site.xml:
# vim?hdfs-site.xml
| <configuration> ???????? <!--指定hdfs的nameservice為ns1,需要和core-site.xml中的保持一致 --> ???????? <property> ?????????????????? <name>dfs.nameservices</name> ?????????????????? <value>ns1</value> ???????? </property> ???????? <!-- ns1下面有兩個NameNode,分別是nn1,nn2 --> ???????? <property> ?????????????????? <name>dfs.ha.namenodes.ns1</name> ?????????????????? <value>nn1,nn2</value> ???????? </property> ???????? <!-- nn1的RPC通信地址 --> ???????? <property> ?????????????????? <name>dfs.namenode.rpc-address.ns1.nn1</name> ?????????????????? <value>hadoop001:9000</value> ???????? </property> ???????? <!-- nn1的http通信地址 --> ???????? <property> ?????????????????? <name>dfs.namenode.http-address.ns1.nn1</name> ?????????????????? <value>hadoop001:50070</value> ???????? </property> ???????? <!-- nn2的RPC通信地址 --> ???????? <property> ?????????????????? <name>dfs.namenode.rpc-address.ns1.nn2</name> ?????????????????? <value>hadoop002:9000</value> ???????? </property> ???????? <!-- nn2的http通信地址 --> ???????? <property> ?????????????????? <name>dfs.namenode.http-address.ns1.nn2</name> ?????????????????? <value>hadoop002:50070</value> ???????? </property> ???????? <!-- 指定NameNode的元數據在JournalNode上的存放位置 --> ???????? <property> ?????????????????? <name>dfs.namenode.shared.edits.dir</name> ?????????????????? <value>qjournal://hadoop005:8485;hadoop006:8485;hadoop007:8485/ns1</value> ???????? </property> ???????? <!-- 指定JournalNode在本地磁盤存放數據的位置 --> ???????? <property> ?????????????????? <name>dfs.journalnode.edits.dir</name> ?????????????????? <value>/opt/data/journaldata</value> ???????? </property> ???????? <!-- 開啟NameNode失敗自動切換 --> ???????? <property> ?????????????????? <name>dfs.ha.automatic-failover.enabled</name> ?????????????????? <value>true</value> ???????? </property> ???????? <!-- 配置失敗自動切換實現方式 --> ???????? <property> ?????????????????? <name>dfs.client.failover.proxy.provider.ns1</name> ?????????????????? <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> ???????? </property> ???????? <!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行--> ???????? <property> ?????????????????? <name>dfs.ha.fencing.methods</name> ?????????????????? <value> ??????????????????????????? sshfence ??????????????????????????? shell(/bin/true) ?????????????????? </value> ???????? </property> ???????? <!-- 使用sshfence隔離機制時需要ssh免登陸 --> ???????? <property> ?????????????????? <name>dfs.ha.fencing.ssh.private-key-files</name> ?????????????????? <value>/root/.ssh/id_rsa</value> ???????? </property> ???????? <!-- 配置sshfence隔離機制超時時間 --> ???????? <property> ?????????????????? <name>dfs.ha.fencing.ssh.connect-timeout</name> ?????????????????? <value>30000</value> ???????? </property> </configuration> |
修改mapred-site.xml:
# cp mapred-site.xml.template mapred-site.xml
# vim mapred-site.xml
| <configuration> ???????? <!-- 指定mr框架為yarn方式 --> ???????? <property> ?????????????????? <name>mapreduce.framework.name</name> ?????????????????? <value>yarn</value> ???????? </property> </configuration> |
修改yarn-site.xml:
# vim yarn-site.xml
| <configuration> ???????? <!-- 開啟RM高可用 --> ???????? <property> ???????? ?? <name>yarn.resourcemanager.ha.enabled</name> ???????? ?? <value>true</value> ???????? </property> ???????? <!-- 指定RM的cluster id --> ???????? <property> ???????? ?? <name>yarn.resourcemanager.cluster-id</name> ???????? ?? <value>yrc</value> ???????? </property> ???????? <!-- 指定RM的名字 --> ???????? <property> ???????? ?? <name>yarn.resourcemanager.ha.rm-ids</name> ???????? ?? <value>rm1,rm2</value> ???????? </property> ???????? <!-- 分別指定RM的地址 --> ???????? <property> ???????? ?? <name>yarn.resourcemanager.hostname.rm1</name> ???????? ?? <value>hadoop003</value> ???????? </property> ???????? <property> ???????? ?? <name>yarn.resourcemanager.hostname.rm2</name> ???????? ?? <value>hadoop004</value> ???????? </property> ???????? <!-- 指定zk集群地址 --> ???????? <property> ???????? ?? <name>yarn.resourcemanager.zk-address</name> ???????? ?? <value>hadoop005:2181,hadoop006:2181,hadoop007:2181</value> ???????? </property> ???????? <property> ???????? ?? <name>yarn.nodemanager.aux-services</name> ???????? ?? <value>mapreduce_shuffle</value> ???????? </property> </configuration> |
修改slaves(slaves是指定子節點的位置,因為要在hadoop001上啟動HDFS、在hadoop003啟動yarn,所以hadoop001上的slaves文件指定的是datanode的位置,hadoop003上的slaves文件指定的是nodemanager的位置):
# vim slaves
| hadoop005 hadoop006 hadoop007 |
?
?
配置免密碼登錄:
在hadoop001上產生一對密鑰
# ssh-keygen -t rsa
配置hadoop001到hadoop002、hadoop003、hadoop004、hadoop005、hadoop006、hadoop007的免密碼登陸
將公鑰拷貝到其他節點,包括自己
# ssh-copy-id hadoop001
# ssh-copy-id hadoop002
# ssh-copy-id hadoop003
# ssh-copy-id hadoop004
# ssh-copy-id hadoop005
# ssh-copy-id hadoop006
# ssh-copy-id hadoop007
?
在hadoop003上產生一對密鑰
# ssh-keygen -t rsa
配置hadoop003到hadoop004、hadoop005、hadoop006、hadoop007的免密碼登陸
# ssh-copy-id hadoop004
# ssh-copy-id hadoop005
# ssh-copy-id hadoop006
# ssh-copy-id hadoop007
?
注意:兩個namenode之間要配置ssh免密碼登陸,
在hadoop002上產生一對密鑰
# ssh-keygen -t rsa
配置hadoop002到hadoop001的免登陸
# ssh-copy-id hadoop001
?
?
將配置好的hadoop2.4.1拷貝到其他節點:
# scp -r hadoop-2.4.1/ hadoop002:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop003:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop004:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop005:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop006:/opt/modules/
# scp -r hadoop-2.4.1/ hadoop007:/opt/modules/
?
安裝配置zooekeeper集群(在hadoop005)
解壓zookeeper:
# tar -zxvf zookeeper-3.4.5.tar.gz -C /opt/modules/
?
添加環境變量:
# vim /etc/profile
##ZOOKEEPER
export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.5
export PATH=$PATH:$ZOOKEEPER_HOME/bin
?
修改配置:
# pwd
/opt/modules/zookeeper-3.4.5/conf
# cp zoo_sample.cfg zoo.cfg
# vim zoo.cfg
修改:dataDir=/opt/modules/zookeeper-3.4.5/tmp
在配置文件最后添加:
server.1=hadoop005:2888:3888
server.2=hadoop006:2888:3888
server.3=hadoop007:2888:3888
創建tmp文件夾
# mkdir tmp
在tmp文件夾中創建空文件添加myid文本為1
# echo 1 > myid
示例:
# cat myid?
1
?
將配置好的zookeeper拷貝到其他節點:
# scp -r zookeeper-3.4.5/ hadoop006:/opt/modules/
# scp -r zookeeper-3.4.5/ hadoop007:/opt/modules/
注意:修改hadoop006、hadoop007對應/opt/modules/zookeeper-3.4.5/tmp/myid內容
hadoop006:
# echo 2 > myid
hadoop007:
# echo 3 > myid
?
?
注意:第一次啟動集群嚴格按照下面的步驟:
啟動zookeeper集群(分別在hadoop005、hadoop00、hadoop007上啟動)
$ zkServer.sh start
查看狀態:
# zkServer.sh status 一個leader兩個follower
?
啟動journalnode(分別在hadoop005、hadoop00、hadoop007上執行)
# hadoop-daemon.sh start journalnode
運行jps命令:若是有JournalNode進程說明journalnode執行成功
# jps
示例:
2308 QuorumPeerMain
2439 JournalNode
2486 Jps
?
格式化HDFS:
# hdfs namenode –format
格式化后會在根據core-site.xml中的hadoop.tmp.dir配置生成個文件,這里我配置的是/opt/data/tmp,然后將/opt/data/tmp拷貝到hadoop002的/opt/data/下。
scp -r tmp/ hadoop002: /opt/data/
?
格式化ZKFC:
# hdfs zkfc –formatZK
?
啟動HDFS(在hadoop001上啟動):
# start-dfs.sh
?
啟動YARN(注意:在hadoop003上啟動。把namenode和resourceManager分開是因為性能問題,因為他們都要占用大量的資源,所以要分開,啟動當然是在不同機器上啟動。):
# start-yarn.sh
?
?
?
hadoop2.4.1配置完畢,可以瀏覽器訪問:
http://hadoop001:50070
NameNode’hadoop001:9000’(active)
http://hadoop002:50070
NameNode’hadoop002:9000’(standby)
?
測試集群工作狀態的一些指令 :
# hdfs dfsadmin -report?? 查看hdfs的各節點狀態信息
# hdfs haadmin -getServiceState nn1?????????????? 獲取一個namenode節點的HA狀態
# hadoop-daemon.sh start namenode? 單獨啟動一個namenode進程
# hadoop-daemon.sh start zkfc?? 單獨啟動一個zkfc進程
?
?
如果只有3臺主機,可以按照如下規劃來部署安裝??????????????????
hadoop001?????????????????????????????????? zookeeper??? journalnode?? namenode zkfc??? resourcemanager? datanode
hadoop002?????????????????????????????????? zookeeper??? journalnode?? namenode zkfc??? resourcemanager? datanode
hadoop003?????????????????????????????????? zookeeper??? journalnode?? datanode? ? ? ??
轉載于:https://www.cnblogs.com/goodcheap/p/6113098.html
總結
以上是生活随笔為你收集整理的hadoop2.4.1集群搭建的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 你好请问一下美短蓝猫和银渐层交配生出来的
- 下一篇: 第八次作业