日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Mahout分步式程序开发 基于物品的协同过滤ItemCF

發布時間:2025/3/21 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Mahout分步式程序开发 基于物品的协同过滤ItemCF 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Hadoop家族系列文章,主要介紹Hadoop家族產品,常用的項目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的項目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。

從2011年開始,中國進入大數據風起云涌的時代,以Hadoop為代表的家族軟件,占據了大數據處理的廣闊地盤。開源界及廠商,所有數據軟件,無一不向Hadoop靠攏。Hadoop也從小眾的高富帥領域,變成了大數據開發的標準。在Hadoop原有技術基礎之上,出現了Hadoop家族產品,通過“大數據”概念不斷創新,推出科技進步。

作為IT界的開發人員,我們也要跟上節奏,抓住機遇,跟著Hadoop一起雄起!

關于作者:

  • 張丹(Conan), 程序員Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog:?http://blog.fens.me
  • email: bsspirit@gmail.com

轉載請注明出處:
http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/

前言

Mahout是Hadoop家族一員,從血緣就繼承了Hadoop程序的特點,支持HDFS訪問和MapReduce分步式算法。隨著Mahout的發展,從0.7版本開始,Mahout做了重大的升級。移除了部分算法的單機內存計算,只支持基于Hadoop的MapReduce平行計算。

從這點上,我們能看出Mahout走向大數據,堅持并行化的決心!相信在Hadoop的大框架下,Mahout最終能成為一個大數據的明星產品!

目錄

  • Mahout開發環境介紹
  • Mahout基于Hadoop的分步環境介紹
  • 用Mahout實現協同過濾ItemCF
  • 模板項目上傳github
  • 1. Mahout開發環境介紹

    在?用Maven構建Mahout項目?文章中,我們已經配置好了基于Maven的Mahout的開發環境,我們將繼續完成Mahout的分步式的程序開發。

    本文的mahout版本為0.8。

    開發環境:

    • Win7 64bit
    • Java 1.6.0_45
    • Maven 3
    • Eclipse Juno Service Release 2
    • Mahout 0.8
    • Hadoop 1.1.2

    找到pom.xml,修改mahout版本為0.8

    <mahout.version>0.8</mahout.version>

    然后,下載依賴庫。

    ~ mvn clean install

    由于 org.conan.mymahout.cluster06.Kmeans.java 類代碼,是基于mahout-0.6的,所以會報錯。我們可以先注釋這個文件。

    2. Mahout基于Hadoop的分步環境介紹

    如上圖所示,我們可以選擇在win7中開發,也可以在linux中開發,開發過程我們可以在本地環境進行調試,標配的工具都是Maven和Eclipse。

    Mahout在運行過程中,會把MapReduce的算法程序包,自動發布的Hadoop的集群環境中,這種開發和運行模式,就和真正的生產環境差不多了。

    3. 用Mahout實現協同過濾ItemCF

    實現步驟:

    • 1. 準備數據文件: item.csv
    • 2. Java程序:HdfsDAO.java
    • 3. Java程序:ItemCFHadoop.java
    • 4. 運行程序
    • 5. 推薦結果解讀

    1). 準備數據文件: item.csv
    上傳測試數據到HDFS,單機內存實驗請參考文章:用Maven構建Mahout項目

    ~ hadoop fs -mkdir /user/hdfs/userCF ~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /user/hdfs/userCF~ hadoop fs -cat /user/hdfs/userCF/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0

    2). Java程序:HdfsDAO.java
    HdfsDAO.java,是一個HDFS操作的工具,用API實現Hadoop的各種HDFS命令,請參考文章:Hadoop編程調用HDFS

    我們這里會用到HdfsDAO.java類中的一些方法:

    HdfsDAO hdfs = new HdfsDAO(HDFS, conf);hdfs.rmr(inPath);hdfs.mkdirs(inPath);hdfs.copyFile(localFile, inPath);hdfs.ls(inPath);hdfs.cat(inFile);

    3). Java程序:ItemCFHadoop.java
    用Mahout實現分步式算法,我們看到Mahout in Action中的解釋。

    實現程序:

    package org.conan.mymahout.recommendation;import org.apache.hadoop.mapred.JobConf; import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob; import org.conan.mymahout.hdfs.HdfsDAO;public class ItemCFHadoop {private static final String HDFS = "hdfs://192.168.1.210:9000";public static void main(String[] args) throws Exception {String localFile = "datafile/item.csv";String inPath = HDFS + "/user/hdfs/userCF";String inFile = inPath + "/item.csv";String outPath = HDFS + "/user/hdfs/userCF/result/";String outFile = outPath + "/part-r-00000";String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis();JobConf conf = config();HdfsDAO hdfs = new HdfsDAO(HDFS, conf);hdfs.rmr(inPath);hdfs.mkdirs(inPath);hdfs.copyFile(localFile, inPath);hdfs.ls(inPath);hdfs.cat(inFile);StringBuilder sb = new StringBuilder();sb.append("--input ").append(inPath);sb.append(" --output ").append(outPath);sb.append(" --booleanData true");sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity");sb.append(" --tempDir ").append(tmpPath);args = sb.toString().split(" ");RecommenderJob job = new RecommenderJob();job.setConf(conf);job.run(args);hdfs.cat(outFile);}public static JobConf config() {JobConf conf = new JobConf(ItemCFHadoop.class);conf.setJobName("ItemCFHadoop");conf.addResource("classpath:/hadoop/core-site.xml");conf.addResource("classpath:/hadoop/hdfs-site.xml");conf.addResource("classpath:/hadoop/mapred-site.xml");return conf;} }

    RecommenderJob.java,實際上就是封裝了,上面整個圖的分步式并行算法的執行過程!如果沒有這層封裝,我們需要自己去實現圖中8個步驟MapReduce算法。

    關于上面算法的深度剖析,請參考文章:R實現MapReduce的協同過濾算法

    4). 運行程序
    控制臺輸出:

    Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF Create: hdfs://192.168.1.210:9000/user/hdfs/userCF copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF ls: hdfs://192.168.1.210:9000/user/hdfs/userCF ========================================================== name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229 ========================================================== cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv 1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 2013-10-14 10:26:35 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-10-14 10:26:35 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:35 org.apache.hadoop.io.compress.snappy.LoadSnappy 警告: Snappy native library not loaded 2013-10-14 10:26:36 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0001 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getCompressor 信息: Got brand-new compressor 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_m_000000_0' done. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getDecompressor 信息: Got brand-new decompressor 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 42 bytes 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0001_r_000000_0 is allowed to commit now 2013-10-14 10:26:36 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/itemIDIndex 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0001_r_000000_0' done. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0001 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=187 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=3287330 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=916 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=3443292 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=645 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=229 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=46 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map input records=21 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=84 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=376569856 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=116 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Combine input records=21 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0002 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0002_m_000000_0' done. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 68 bytes 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0002_r_000000_0 is allowed to commit now 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0002_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/userVectors 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0002_r_000000_0' done. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0002 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: USERS=5 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=288 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=6574274 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=1374 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=6887592 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=1120 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=229 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=72 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map input records=21 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=42 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=63 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=575930368 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=116 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Combine input records=0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=21 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=5 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Combine output records=0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=5 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0003 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0003_m_000000_0' done. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 89 bytes 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0003_r_000000_0 is allowed to commit now 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0003_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/ratingMatrix 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0003_r_000000_0' done. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0003 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Counters: 21 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=335 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: USER_RATINGS_NEGLECTED=0 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: USER_RATINGS_USED=21 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=9861349 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=1950 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=10331958 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=1751 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=288 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=93 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map input records=5 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=336 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=775290880 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=157 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Combine input records=21 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0004 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0004_m_000000_0' done. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 118 bytes 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0004_r_000000_0 is allowed to commit now 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0004_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/weights 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0004_r_000000_0' done. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0004 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=381 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=13148476 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=2628 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=13780408 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=2551 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=335 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: ROWS=7 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=122 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map input records=7 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=16 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=516 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=974651392 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=158 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Combine input records=24 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=8 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=8 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Combine output records=8 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=5 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log 信息: Map output records=24 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0005 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0005_m_000000_0' done. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 121 bytes 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0005_r_000000_0 is allowed to commit now 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0005_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/pairwiseSimilarity 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0005_r_000000_0' done. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0005 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Counters: 21 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=392 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=16435577 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=3488 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=17230010 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=3408 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=381 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: PRUNED_COOCCURRENCES=0 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: COOCCURRENCES=57 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=125 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map input records=5 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=744 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1174011904 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=129 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Combine input records=21 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0006 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0006_m_000000_0' done. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 158 bytes 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0006_r_000000_0 is allowed to commit now 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0006_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/similarityMatrix 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0006_r_000000_0' done. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0006 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=554 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=19722740 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=4342 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=20674772 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=4354 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=392 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=162 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map input records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=14 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=599 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1373372416 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=140 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Combine input records=25 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Combine output records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log 信息: Map output records=25 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0007 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0007_m_000000_0' done. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0007_m_000001_0' done. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 2 sorted segments 2013-10-14 10:26:42 org.apache.hadoop.io.compress.CodecPool getDecompressor 信息: Got brand-new decompressor 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 2 segments left of total size: 233 bytes 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0007_r_000000_0 is allowed to commit now 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0007_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/partialMultiply 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0007_r_000000_0' done. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0007 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=572 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=34517913 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=8751 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=36182630 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=7934 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=241 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map input records=12 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=56 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=453 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=2558459904 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=665 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Combine input records=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=28 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=7 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Combine output records=0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=7 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log 信息: Map output records=28 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus 信息: Total input paths to process : 1 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local_0008 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 100 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/99614720 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/327680 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0008_m_000000_0' done. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 1 sorted segments 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 1 segments left of total size: 206 bytes 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local_0008_r_000000_0 is allowed to commit now 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local_0008_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/userCF/result 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local_0008_r_000000_0' done. 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local_0008 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Counters: 19 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=217 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=26299802 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=7357 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=27566408 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=6269 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=572 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=210 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map input records=7 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=42 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=927 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1971453952 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=137 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Combine input records=0 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=21 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=5 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Combine output records=0 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=5 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log 信息: Map output records=21 cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000 1 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334] 2 [106:1.560478,105:1.4795978,107:0.69935876] 3 [103:1.2475469,106:1.1944525,102:1.1462644] 4 [102:1.6462644,105:1.5277859,107:0.69935876] 5 [107:1.1993587]

    5). 推薦結果解讀
    我們可以把上面的日志分解析成3個部分解讀

    • a. 初始化環境
    • b. 算法執行
    • c. 打印推薦結果

    a. 初始化環境
    出初HDFS的數據目錄和工作目錄,并上傳數據文件。

    Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF Create: hdfs://192.168.1.210:9000/user/hdfs/userCF copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF ls: hdfs://192.168.1.210:9000/user/hdfs/userCF ========================================================== name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229 ========================================================== cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv

    b. 算法執行
    分別執行,上圖中對應的8種MapReduce算法。
    Job complete: job_local_0001
    Job complete: job_local_0002
    Job complete: job_local_0003
    Job complete: job_local_0004
    Job complete: job_local_0005
    Job complete: job_local_0006
    Job complete: job_local_0007
    Job complete: job_local_0008

    c. 打印推薦結果

    方便我們看到計算后的推薦結果

    cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000 1 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334] 2 [106:1.560478,105:1.4795978,107:0.69935876] 3 [103:1.2475469,106:1.1944525,102:1.1462644] 4 [102:1.6462644,105:1.5277859,107:0.69935876] 5 [107:1.1993587]

    4. 模板項目上傳github

    https://github.com/bsspirit/maven_mahout_template/tree/mahout-0.8

    大家可以下載這個項目,做為開發的起點。

    ~ git clone https://github.com/bsspirit/maven_mahout_template ~ git checkout mahout-0.8

    我們完成了基于物品的協同過濾分步式算法實現,下面將繼續介紹Mahout的Kmeans的分步式算法實現,請參考文章:Mahout分步式程序開發 聚類Kmeans

    轉載請注明出處:
    http://blog.fens.me/hadoop-mahout-mapreduce-itemcf/

    總結

    以上是生活随笔為你收集整理的Mahout分步式程序开发 基于物品的协同过滤ItemCF的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。