當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

[Hadoop] - Win7下提交job到集群上去

發布時間：2023/12/10 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 [Hadoop] - Win7下提交job到集群上去小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一般我們采用win開發+linux hadoop集群的方式進行開發，使用插件：hadoop-***-eclipse-plugin。

運行程序的時候，我們一般采用run as application或者選擇run as hadoop。按照這個字面理解，我們可以認為第一種是運行在本地，第二種是運行在hadoop集群上。但是實際情況是一般如果不進行配置的話，全部是在本地進行運行的。如果需要將job提交到集群上，那么需要進行必要的設置和添加部分代碼。

1、copy mapred-site.xml && yarn-site.xml文件，并修改必要的信息，將yarn指向集群。

2、給mapred-site.xml文件中添加參數mapreduce.app-submission.cross-platform，參數值為true。

3、打包本地代碼提交到集群上，如果不進行該操作，會出現ClassNotFoundException。打包代碼如下：

1 import java.io.File; 2 import java.io.FileInputStream; 3 import java.io.FileOutputStream; 4 import java.io.IOException; 5 import java.util.jar.JarEntry; 6 import java.util.jar.JarOutputStream; 7 8 public class EJob { 9 10 public static File createTempJar(String root) throws IOException { 11 if (!new File(root).exists()) { 12 return null; 13 } 14 15 final File jarFile = File.createTempFile("EJob-", ".jar", new File(System 16 .getProperty("java.io.tmpdir"))); 17 18 Runtime.getRuntime().addShutdownHook(new Thread() { 19 @Override 20 public void run() { 21 jarFile.delete(); 22 } 23 }); 24 25 JarOutputStream out = new JarOutputStream(new FileOutputStream(jarFile)); 26 createTempJarInner(out, new File(root), ""); 27 out.flush(); 28 out.close(); 29 return jarFile; 30 } 31 32 private static void createTempJarInner(JarOutputStream out, File f, 33 String base) throws IOException { 34 if (f.isDirectory()) { 35 File[] fl = f.listFiles(); 36 if (base.length() > 0) { 37 base = base + "/"; 38 } 39 for (int i = 0; i < fl.length; i++) { 40 createTempJarInner(out, fl[i], base + fl[i].getName()); 41 } 42 } else { 43 out.putNextEntry(new JarEntry(base)); 44 FileInputStream in = new FileInputStream(f); 45 byte[] buffer = new byte[1024]; 46 int n = in.read(buffer); 47 while (n != -1) { 48 out.write(buffer, 0, n); 49 n = in.read(buffer); 50 } 51 in.close(); 52 } 53 } 54 } EJob 打包代碼工具類 File jarFile = EJob.createTempJar("target/classes"); ((JobConf) job.getConfiguration()).setJar(jarFile.toString()); // 其他創建job的代碼不進行任何的修改

至此，就可以將job提交到集群上去了。

對應任何在非hadoop集群中提交的mr任務來講，均需要注意一下幾點：

1. 參數mapreduce.app-submission.cross-platform必須設置為true，表示是跨集群提交job

2. 如果參數mapreduce.framework.name值為yarn，那么必須將類YarnClientProtocolProvider引入到項目的classpath路徑中，maven依賴如下：

// 其他正常的hadoop-mapreduce-client依賴還是需要的，只是這個在跨平臺提交的過程中是一定需要的 <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-jobclient</artifactId><version>${hadoop.version}</version> </dependency>

3. 如果集群是HA設置，那么必須給定HA配置或者采用明確指定active節點的方式。必須給定的參數有yarn.resourcemanager.address和fs.defaultFS之類的定位參數

當HDFS和Yarn均使用HA的時候，跨集群提交最少配置(依賴集群的具體搭建方法，比如如果在搭建過程中執行了yarn的classpath，那么yarn-site.xml中的參數yarn.application.classpath可以不要，其他參數不可以少，必須存在!!!)

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>fs.defaultFS</name><value>hdfs://hdfs-cluster</value></property> </configuration> core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>dfs.nameservices</name><value>hdfs-cluster</value></property><property><name>dfs.ha.namenodes.hdfs-cluster</name><value>hdfs-cluster-1,hdfs-cluster-2</value></property><property><name>dfs.namenode.rpc-address.hdfs-cluster.hdfs-cluster-1</name><value>hdfs-cluster-1:8020</value></property><property><name>dfs.namenode.rpc-address.hdfs-cluster.hdfs-cluster-2</name><value>hdfs-cluster-2:8020</value></property><property><name>dfs.client.failover.proxy.provider.hdfs-cluster</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property></configuration> hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.app-submission.cross-platform</name><value>true</value></property> </configuration> mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.cluster-id</name><value>yarn-cluster</value></property><property><name>yarn.resourcemanager.ha.rm-ids</name><value>yarn-cluster-1,yarn-cluster-2</value></property><property><name>yarn.resourcemanager.address.yarn-cluster-1</name><value>yarn-cluster-1:8032</value></property><property><name>yarn.resourcemanager.address.yarn-cluster-2</name><value>yarn-cluster-2:8032</value></property><property><name>yarn.application.classpath</name><value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value></property> </configuration> yarn-site.xml

轉載于:https://www.cnblogs.com/liuming1992/p/4843562.html

總結

以上是生活随笔為你收集整理的[Hadoop] - Win7下提交job到集群上去的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：高可用Eureka注册中心配置说明(双机
下一篇： 5大过程组与整体管理