Hadoop学习:Map/Reduce初探与小Demo实现
一、????概念知識介紹
??????? Hadoop MapReduce是一個用于處理海量數(shù)據(jù)的分布式計算框架。這個框架攻克了諸如數(shù)據(jù)分布式存儲、作業(yè)調度、容錯、機器間通信等復雜問題,能夠使沒有并行 處理或者分布式計算經驗的project師,也能非常輕松地寫出結構簡單的、應用于成百上千臺機器處理大規(guī)模數(shù)據(jù)的并行分布式程序。
???????Hadoop MapReduce基于“分而治之”的思想,將計算任務抽象成map和reduce兩個計算過程,能夠簡單理解為“分散運算—歸并結果”的過程。一個 MapReduce程序首先會把輸入數(shù)據(jù)切割成不相關的若干鍵/值對(key1/value1)集合。這些鍵/值對會由多個map任務來并行地處理。 MapReduce會對map的輸出(一些中間鍵/值對key2/value2集合)依照key2進行排序,排序是用memcmp的方式對key在內存中 字節(jié)數(shù)組比較后進行升序排序。并將屬于同一個key2的全部value2組合在一起作為reduce任務的輸入,由reduce任務計算出終于結果并輸出 key3/value3。作為一個優(yōu)化。同一個計算節(jié)點上的key2/value2會通過combine在本地歸并。基本流程例如以下:
???????Hadoop和單機程序計算流程對照:
???????常計算任務的輸入和輸出都是存放在文件中的,而且這些文件被存放在Hadoop分布式文件系統(tǒng)HDFS(Hadoop Distributed File System)中,系統(tǒng)會盡量調度計算任務到數(shù)據(jù)所在的節(jié)點上執(zhí)行,而不是盡量將數(shù)據(jù)移動到計算節(jié)點上。降低大量數(shù)據(jù)在網(wǎng)絡中傳輸,盡量節(jié)省帶寬消耗。
???????應用程序開發(fā)者普通情況下須要關心的是圖中灰色的部分,單機程序須要處理數(shù)據(jù)讀取和寫入、數(shù)據(jù)處理;Hadoop程序須要實現(xiàn)map和 reduce。而數(shù)據(jù)讀取和寫入、map和reduce之間的傳輸數(shù)據(jù)、容錯處理等由Hadoop MapReduce和HDFS自己主動完畢。
二、????開發(fā)環(huán)境搭建
???????Map/Reduce程序依賴Hadoop集群,另外Eclipse須要安裝依賴的hadoop包。
???????Hadoop集群搭建:參考Hadoop 2.2.0集群搭建
1.???安裝、配置Eclipse
???????在官網(wǎng)下載合適的Eclipse,將hadoop開發(fā)所依賴的插件jar包復制到eclipse的安裝目錄plugins下。下載地址參考:hadoop2.2.0開發(fā)依賴的jar包,當然也能夠自己編譯。
???????啟動eclipse,選擇Window—>Prefrances,若出現(xiàn)例如以下Hadoop Map/Reduce說明插件成功安裝
2.???配置DFS,主要是數(shù)據(jù)文件的輸入輸出管理。
???????Window—>Open Perspective—>other—>Map/Reduce,顯示Map/Reduce視圖。點擊Map/Reduce Locations 的小象圖標。新建Hadoop Location,輸入例如以下:
???????項目視圖會出現(xiàn)DFS Location。用來管理輸入、輸出數(shù)據(jù)文件。
???????須要配置hadoop安裝目錄:新建Map/Reduceproject單擊Configure Hadoop install direction。輸入hadoop的安裝路徑。
???????右鍵單擊DFS Location下的空目錄上傳一個文本文件,然后刷新,若文件出現(xiàn)了則說明環(huán)境配置成功。
三、????編程模型
???????MapReduce編程模型的原理是:利用一個輸入key/value pair集合來產生一個輸出的key/value pair集合。
MapReduce庫的用戶用兩個函數(shù)表達這個計算:Map和Reduce。
???????用戶自己定義的Map函數(shù)接受一個輸入的key/value pair值,然后產生一個中間key/value pair值的集合。
MapReduce庫把全部具有同樣中間key值I的中間value值集合在一起后傳遞給reduce函數(shù)。
???????用戶自己定義的Reduce函數(shù)接受一個中間key的值I和相關的一個value值的集合。Reduce函數(shù)合并這些value值,形成一個較小的 value值的集合。一般的。每次Reduce函數(shù)調用僅僅產生0或1個輸出value值。
通常我們通過一個迭代器把中間value值提供給Reduce函 數(shù),這樣我們就能夠處理無法所有放入內存中的大量的value值的集合。
四、????小樣例
1.??????數(shù)據(jù)準備
???????以Tomcat日志為例。日志格式例如以下:
127.0.0.1,-,-,[08/May/2014:13:42:40 +0800],GET / HTTP/1.1,200,11444 127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105 127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913 127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,- 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/baseInfo_articleList?
flag=1 HTTP/1.1,200,8989 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?
graduateBatchName=&pageSize=10&pageNo=1 HTTP/1.1,200,597 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?
graduateBatchName= HTTP/1.1,200,21 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:26 +0800],GET /jyglFront/graduate_initGraduateBatch HTTP/1.1,200,8766 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:28 +0800],GET /jyglFront/graduate_initGraduateQulifyCheck HTTP/1.1,200,26397 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:29 +0800],GET /jyglFront/graduate_initLeaveSchoolInfo HTTP/1.1,200,20125 127.0.0.1,-,-,[08/May/2014:13:43:30 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch HTTP/1.1,200,597 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:31 +0800],GET /jyglFront/graduate_initGraduateInfo HTTP/1.1,200,28464 127.0.0.1,-,-,[08/May/2014:14:27:10 +0800],GET / HTTP/1.1,200,11444 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16 127.0.0.1,-,-,[08/May/2014:14:27:35 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:35 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:37 +0800],GET /jyglFront/exam_initsubstudentsubscribe HTTP/1.1,500,3900 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:41 +0800],GET /jyglFront/supervisor/intoInitAssignmentDetail HTTP/1.1,200,1808 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:43 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getAllNationInPage?
pageSize=10&pageNo=1 HTTP/1.1,200,374 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getTotalNations HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/baseInfo_nationInfoList HTTP/1.1,200,7471 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/menuStyle2.css HTTP/1.1,404,1060 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/basic.css HTTP/1.1,200,1476 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:45 +0800],GET /jyglFront/common/css/_images/botton2.gif HTTP/1.1,404,1075 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= HTTP/1.1,200,12061 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject HTTP/1.1,200,6006 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:48 +0800],GET /jyglFront/teaching/openReplaceChooseCourse HTTP/1.1,200,26455 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?
newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:49 +0800],GET /jyglFront/teaching/openChooseCourse HTTP/1.1,200,1611 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo HTTP/1.1,200,473 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?
gradeId=4af2a042437c2c0801437ed1cdea0017 HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a0423f41d66d013f5a1f766c00ce HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?
grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= HTTP/1.1,200,4849 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/teaching/teachingPlanList HTTP/1.1,200,22794 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/js/jquery.form.js HTTP/1.1,200,30330 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:28:02 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551 127.0.0.1,-,-,[08/May/2014:14:28:19 +0800],POST /jygl/jaxrs/right/addUserLog HTTP/1.1,200,- 127.0.0.1,-,-,[08/May/2014:14:31:42 +0800],GET /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 HTTP/1.1,200,-
2.??????要解決的問題:統(tǒng)計資源(URL)被訪問的次數(shù)。
3.??????編程實現(xiàn)
???????想法:解析Tomcat日志,map的工作是將每一行日志中的URL截取作為key值,value為1表示1次,reduce的工作是將同樣key值的行合并。value為總次數(shù)。
代碼例如以下:
package org.ly.ccnu; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SecondTest extends Configured implements Tool{enum Counter{LINESKIP,} public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{private static final IntWritable one = new IntWritable(1); public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{String line = value.toString();try{String[] lineSplit = line.split(",");String requestUrl = lineSplit[4];requestUrl = requestUrl.substring(requestUrl.indexOf(' ')+1, requestUrl.lastIndexOf(' '));Text out = new Text(requestUrl);context.write(out,one);}catch(java.lang.ArrayIndexOutOfBoundsException e){context.getCounter(Counter.LINESKIP).increment(1);} }} public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{ public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException{int count = 0; for(IntWritable v : values){ count = count + 1; } try {context.write(key, new IntWritable(count));} catch (InterruptedException e) {e.printStackTrace();} } } @Overridepublic int run(String[] args) throws Exception {Configuration conf = getConf();Job job = new Job(conf, "logAnalysis");job.setJarByClass(SecondTest.class); FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setOutputFormatClass(TextOutputFormat.class); //keep the same format with the output of Map and Reducejob.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class); job.waitForCompletion(true);return job.isSuccessful()?0:1;} public static void main(String[] args)throws Exception{ int res = ToolRunner.run(new Configuration(), new SecondTest(),args); System.exit(res);} }4.????? 處理結果
/ 2 /jygl/jaxrs/article/getArticleList/10-1 3 /jygl/jaxrs/article/getTotalArticleRecords 3 /jygl/jaxrs/enroll/educationLevelService/allEducationLevels 5 /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos 2 /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo 1 /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject 1 /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest 2 /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd 2 /jygl/jaxrs/exam/examParameterService/getAllGradeInfo 3 /jygl/jaxrs/exam/examParameterService/getAllStudyCenters 3 /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 2 /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 1 /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch 1 /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?graduateBatchName=&pageSize=10&pageNo=1 1 /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?graduateBatchName= 1 /jygl/jaxrs/nationInfo/getAllNationInPage?pageSize=10&pageNo=1 1 /jygl/jaxrs/nationInfo/getTotalNations 1 /jygl/jaxrs/right/addUserLog 1 /jygl/jaxrs/right/getUserByLoginName/admin 3 /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin 3 /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= 1 /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?
gradeId=4af2a0423f41d66d013f5a1f766c00ce 1 /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?
gradeId=4af2a042437c2c0801437ed1cdea0017 1 /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= 1 /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 2 /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO 2 /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO 2 /jyglFront/baseInfo_articleList?flag=1 3 /jyglFront/baseInfo_nationInfoList 1 /jyglFront/common/css/_images/botton2.gif 1 /jyglFront/common/css/basic.css 1 /jyglFront/common/css/menuStyle2.css 1 /jyglFront/exam_initgroupsubscribestatistic 2 /jyglFront/exam_initsubstudentsubscribe 1 /jyglFront/graduate_initGraduateBatch 1 /jyglFront/graduate_initGraduateInfo 1 /jyglFront/graduate_initGraduateQulifyCheck 1 /jyglFront/graduate_initLeaveSchoolInfo 1 /jyglFront/js/jquery.form.js 1 /jyglFront/mainView/navigate/images/allmenu.gif 3 /jyglFront/mainView/navigate/images/leftmenu_bg.gif 3 /jyglFront/mainView/navigate/images/logo.png 3 /jyglFront/mainView/navigate/images/toggle_menu.gif 3 /jyglFront/mainView/navigate/js/frame.js 3 /jyglFront/mainView/navigate/js/jquery.js 3 /jyglFront/mainView/navigate/js/tree.js 3 /jyglFront/mainView/navigate/menuList.jsp 3 /jyglFront/mainView/navigate/style/images/header_bg.jpg 3 /jyglFront/mainView/navigate/style/style.css 3 /jyglFront/mainView/studentView/style/images/nav_10.png 3 /jyglFront/right_login2home?
loginName=admin&password=superadmin&type=1 3 /jyglFront/supervisor/intoInitAssignmentDetail 1 /jyglFront/teaching/openChooseCourse 1 /jyglFront/teaching/openReplaceChooseCourse 1 /jyglFront/teaching/teachingPlanList 1
總結
以上是生活随笔為你收集整理的Hadoop学习:Map/Reduce初探与小Demo实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux 压缩及解压缩 命令
- 下一篇: 数据中心机房设计及各专业技术平衡