日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪(fǎng)問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop系列(三)MapReduce Job的几种提交运行模式

發(fā)布時(shí)間:2023/12/10 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hadoop系列(三)MapReduce Job的几种提交运行模式 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Job執(zhí)行可以分為本地執(zhí)行或者集群執(zhí)行。hadoop集群安裝部署在遠(yuǎn)程centos系統(tǒng)中。使用經(jīng)典的WordCount代碼為例。

1. 本地執(zhí)行模式(本地為MacOS環(huán)境),無(wú)需啟動(dòng)遠(yuǎn)程的hadoop集群,本地job會(huì)提交給本地執(zhí)行器LocalJobRunner去執(zhí)行。

1)輸入輸出數(shù)據(jù)存放在本地路徑下:

首先,MapReduce代碼如下:

  • Mapper
package com.nasuf.hadoop.mr;import java.io.IOException;import org.apache.commons.lang.StringUtils; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {@Overrideprotected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();String[] words = StringUtils.split(line, " ");for (String word: words) {context.write(new Text(word), new LongWritable(1));}}}
  • Reducer
package com.nasuf.hadoop.mr;import java.io.IOException;import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{@Overrideprotected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {long count = 0;for (LongWritable value: values) {count += value.get();}context.write(key, new LongWritable(count));}}
  • Runner
package com.nasuf.hadoop.mr;import java.io.IOException;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WCRunner {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {Configuration conf = new Configuration();Job job = Job.getInstance(conf);// 設(shè)置整個(gè)job所用的類(lèi)在哪個(gè)jar包job.setJarByClass(WCRunner.class);// 本job實(shí)用的mapper和reducer的類(lèi)job.setMapperClass(WCMapper.class);job.setReducerClass(WCReducer.class);// 指定reducer的輸出數(shù)據(jù)kv類(lèi)型(若不指定下面mapper的輸出類(lèi)型,此處可以同時(shí)表明mapper和reducer的輸出類(lèi)型)job.setOutputKeyClass(Text.class);job.setOutputValueClass(LongWritable.class);// 指定mapper的輸出數(shù)據(jù)kv類(lèi)型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(LongWritable.class);// 指定原始數(shù)據(jù)存放位置FileInputFormat.setInputPaths(job, new Path("/Users/nasuf/Desktop/wc/srcdata"));// 處理結(jié)果的輸出數(shù)據(jù)存放路徑FileOutputFormat.setOutputPath(job, new Path("/Users/nasuf/Desktop/wc/output"));// 將job提交給集群運(yùn)行job.waitForCompletion(true);}}

在本地模式中,可以將測(cè)試數(shù)據(jù)存放在"/Users/nasuf/Desktop/wc/srcdata"路徑下,注意輸出路徑不能是已經(jīng)存在的路徑,不然會(huì)拋出異常。
2) 輸入輸出數(shù)據(jù)存放在hdfs中,需要啟動(dòng)遠(yuǎn)程的hdfs(無(wú)需啟動(dòng)yarn)
修改Runner代碼如下:

// 指定原始數(shù)據(jù)存放位置FileInputFormat.setInputPaths(job, new Path("hdfs://hdcluster01:9000/wc/srcdata"));// 處理結(jié)果的輸出數(shù)據(jù)存放路徑FileOutputFormat.setOutputPath(job, new Path("hdfs://hdcluster01:9000/wc/output1"));

如果出現(xiàn)如下錯(cuò)誤:

org.apache.hadoop.security.AccessControlException: Permission denied: user=nasuf, access=WRITE, inode="/wc":parallels:supergroup:drwxr-xr-x

顯然是權(quán)限問(wèn)題。hadoop的用戶(hù)目錄是parallels,權(quán)限是rwxr-xr-x,而本地操作使用的用戶(hù)是nasuf。解決方法如下:在vm啟動(dòng)參數(shù)中加入如下參數(shù):-DHADOOP_USER_NAME=parallels即可。

2. 集群執(zhí)行模式(首先需要啟動(dòng)yarn,job會(huì)提交到y(tǒng)arn框架中去執(zhí)行。訪(fǎng)問(wèn)http://hdcluster01:8088可以查看job執(zhí)行狀態(tài)。)

1)使用命令直接執(zhí)行jar

hadoop jar wc.jar com.nasuf.hadoop.mr.WCRunner

查看http://hdcluster01:8088中job執(zhí)行狀態(tài)

2) 通過(guò)main方法直接在本地提交job到y(tǒng)arn集群中執(zhí)行
將$HADOOP_HOME/etc/hadoop/mapred-site.xml 和 yarn-site.xml拷貝到工程的classpath下,直接執(zhí)行上述代碼,即可提交job到y(tǒng)arn集群中執(zhí)行。
或者直接在代碼中配置如下參數(shù),與拷貝上述兩個(gè)配置文件相同的作用:

conf.set("mapreduce.framework.name", "yarn"); conf.set("yarn.resourcemanager.hostname", "hdcluster01"); conf.set("yarn.nodemanager.aux-services", "mapreduce_shuffle");

如果出現(xiàn)如下錯(cuò)誤信息:

2018-08-26 10:25:37,544 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1535213323614_0010 failed with state FAILED due to: Application application_1535213323614_0010 failed 2 times due to AM Container for appattempt_1535213323614_0010_000002 exited with exitCode: -1000 due to: File file:/tmp/hadoop-yarn/staging/nasuf/.staging/job_1535213323614_0010/job.jar does not exist .Failing this attempt.. Failing the application.

可以將core-site.xml配置文件同時(shí)拷貝到classpath中,或者同樣配置如下參數(shù):

conf.set("hadoop.tmp.dir", "/home/parallels/app/hadoop-2.4.1/data/");

即可解決問(wèn)題。

總結(jié)

以上是生活随笔為你收集整理的Hadoop系列(三)MapReduce Job的几种提交运行模式的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。