日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python + hadoop (案例)

發布時間:2023/12/31 python 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python + hadoop (案例) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

python如何鏈接hadoop,并且使用hadoop的資源,這篇文章介紹了一個簡單的案例!

一、python的map/reduce代碼

首先認為大家已經對haoop已經有了很多的了解,那么需要建立mapper和reducer,分別代碼如下:

1、mapper.py

#!/usr/bin/env python import sys for line in sys.stdin:line = line.strip()words = line.split()for word in words:print '%s\t%s' %(word, 1)

2、reducer.py

#!/usr/bin/env python from operator import itemgetter import syscurrent_word = None current_count = 0 word = Nonefor line in sys.stdin:words = line.strip()word, count = words.split('\t')try:count = int(count)except ValueError:continueif current_word == word:current_count += countelse:if current_word:print '%s\t%s' %(current_word, current_count)current_count = countcurrent_word = wordif current_word == word:print '%s\t%s' %(current_word, current_count)

建立了兩個代碼之后,測試一下:

[qiu.li@l-tdata5.tkt.cn6 /export/python]$ echo "I like python hadoop , hadoop very good" | ./mapper.py | sort -k 1,1 | ./reducer.py , 1 good 1 hadoop 2 I 1 like 1 python 1 very 1

二、上傳文件

發現沒啥問題,那么成功一半了,下面上傳幾個文件到hadoop做進一步測試。我在線上找了幾個文件,命令如下:

wget http://www.gutenberg.org/ebooks/20417.txt.utf-8 wget http://www.gutenberg.org/files/5000/5000-8.txt wget http://www.gutenberg.org/ebooks/4300.txt.utf-8

查看下載的文件:

[qiu.li@l-tdata5.tkt.cn6 /export/python]$ ls 20417.txt.utf-8 4300.txt.utf-8 5000-8.txt mapper.py reducer.py run.sh

上傳文件到hadoop上面,命令如下:hadoop dfs -put ./*.txt /user/ticketdev/tmp (hadoop是配置好的,目錄也是建立好的)

建立run.sh

hadoop jar $STREAM \-files ./mapper.py,./reducer.py \-mapper ./mapper.py \-reducer ./reducer.py \-input /user/ticketdev/tmp/*.txt \-output /user/ticketdev/tmp/output

查看結果:

[qiu.li@l-tdata5.tkt.cn6 /export/python]$ hadoop dfs -cat /user/ticketdev/tmp/output/part-00000 | sort -nk 2 | tail DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.it 2387 which 2387 that 2668 a 3797 is 4097 to 5079 in 5226 and 7611 of 10388 the 20583

三、參考文獻:

http://www.cnblogs.com/wing1995/p/hadoop.html?utm_source=tuicool&utm_medium=referral

?

總結

以上是生活随笔為你收集整理的python + hadoop (案例)的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。