當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop Shell 命令与 WordCount

發布時間：2023/12/14 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hadoop Shell 命令与 WordCount 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

在前2章內, 我們分別介紹了Hadoop安裝的3種形式(Standalone mode/ Pseudo-Distributed mode/Cluster mode). 本章, 我們介紹如何使用HDFS命令進行一些基本的操作. 官方的操作文檔可以查看Hadoop Shell命令.

正文

前置條件

已經安裝Hadoop集群, 并啟動. 從頁面可以看到, 我們HDFS系統的文件目錄.

基本操作

對于文件系統用的最多的就是, 增刪查改與權限系統一直是我們操作文件系統的基本命令.它們的基本操作分別如下所示:

# 本地倉庫 localhost:current Sean$ hadoop fs -ls / Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/03/30 16:15:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items -rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user# 全路徑 localhost:current Sean$ hadoop fs -ls hdfs://localhost:9000/ Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/03/30 16:16:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items -rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 hdfs://localhost:9000/1.log drwx------ - Sean supergroup 0 2019-03-25 12:11 hdfs://localhost:9000/tmp drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 hdfs://localhost:9000/user

上傳文件 put

# 上傳文件 localhost:current Sean$ hadoop fs -put hello2019.sh / # 查詢上傳的文件 localhost:current Sean$ hadoop fs -ls / Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 Found 4 items -rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log -rw-r--r-- 1 Sean supergroup 10 2019-03-30 16:19 /hello2019.sh drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user# 手動合并(文件是可以還原的.) cat blk_1073741983 >> tmp.file cat blk_1073741984 >> tmp.file

默認文件切分大小為128M, 大于的話會切分成2快.

查看文件內容 cat

# 通過hadoop查看 localhost:current Sean$ hadoop fs -cat /hello2019.sh hello2019#通過本地linux查看 localhost:current Sean$ cat finalized/subdir0/subdir0/blk_1073741983 hello2019localhost:current Sean$ pwd /Users/Sean/Software/hadoop/current/tmp/dfs/data/current/BP-586017156-127.0.0.1-1553485799471/current

下載文件 get

localhost:current Sean$ hadoop fs -get /hello2019.sh localhost:current Sean$ ls VERSION dfsUsed finalized hello2019.sh rbw localhost:current Sean$ cat hello2019.sh hello2019

localhost:current Sean$ hadoop fs -mkdir -p /wordcount/input localhost:current Sean$ hadoop fs -ls /wordcount Found 1 items drwxr-xr-x - Sean supergroup 0 2019-03-30 16:40 /wordcount/input

Others

hdfs的命令操作, 可以通過hadoop fs直接顯示所以命令.

localhost:mapreduce Sean$ hadoop fs Usage: hadoop fs [generic options][-appendToFile <localsrc> ... <dst>][-cat [-ignoreCrc] <src> ...][-checksum <src> ...][-chgrp [-R] GROUP PATH...][-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...][-chown [-R] [OWNER][:[GROUP]] PATH...][-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>][-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>][-count [-q] [-h] <path> ...][-cp [-f] [-p | -p[topax]] <src> ... <dst>][-createSnapshot <snapshotDir> [<snapshotName>]][-deleteSnapshot <snapshotDir> <snapshotName>][-df [-h] [<path> ...]][-du [-s] [-h] <path> ...][-expunge][-find <path> ... <expression> ...][-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>][-getfacl [-R] <path>][-getfattr [-R] {-n name | -d} [-e en] <path>][-getmerge [-nl] <src> <localdst>][-help [cmd ...]][-ls [-d] [-h] [-R] [<path> ...]][-mkdir [-p] <path> ...][-moveFromLocal <localsrc> ... <dst>][-moveToLocal <src> <localdst>][-mv <src> ... <dst>][-put [-f] [-p] [-l] <localsrc> ... <dst>][-renameSnapshot <snapshotDir> <oldName> <newName>][-rm [-f] [-r|-R] [-skipTrash] <src> ...][-rmdir [--ignore-fail-on-non-empty] <dir> ...][-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]][-setfattr {-n name [-v value] | -x name} <path>][-setrep [-R] [-w] <rep> <path> ...][-stat [format] <path> ...][-tail [-f] <file>][-test -[defsz] <path>][-text [-ignoreCrc] <src> ...][-touchz <path> ...][-truncate [-w] <length> <path> ...][-usage [cmd ...]]Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.The general command line syntax is bin/hadoop command [genericOptions] [commandOptions]

help
輸出命令參數手冊.
mkdir
創建目錄. hadoop fs -mkdir -p /abc/acc
moveFromLocal / moveToLocal
從本地移動HDFS(本地原文件刪除). hadoop fs -moveFromLocal abc.txt /
從HDFS移動本地(HDFS原文件刪除). hadoop fs -moveFromLocal abc.txt /
appendToFile
追加到文件上面. hadoop fs -appendToFile abc.txt /hello2019.txt

localhost:mapreduce Sean$ echo xxoo >> hello.txt localhost:mapreduce Sean$ hadoop fs -appendToFile hello.txt /hello2019.sh localhost:mapreduce Sean$ hadoop fs -cat /hello2019.sh Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 hello2019 xxoo

cat
顯示文件. hadoop fs -cat /hello2019.sh. 文件過多hadoop fs -cat /hello2019.sh | more或者hadoop fs -tail /hello2019.sh
tail
顯示文件末尾. hadoop fs -tail /hello2019.sh
text
已字符形式打印一個文件的內容.hadoop fs -text /hello2019.sh.
chgrp / chmod / chown
chgrp更改文件組; chmod更改權限; chown更改用戶和組.

hadoop fs -chmod 666 /hello2019.txt hadoop fs -chown someuser:somegrp /hello2019.txtlocalhost:mapreduce Sean$ hadoop fs -chmod 777 /hello2019.sh localhost:mapreduce Sean$ hadoop fs -ls / Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/03/30 17:09:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 5 items -rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log -rwxrwxrwx 1 Sean supergroup 15 2019-03-30 16:55 /hello2019.sh drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user drwxr-xr-x - Sean supergroup 0 2019-03-30 16:43 /wordcount# hadoop內沒有用戶的設計.(所以沒創建該用戶也可以這樣改造.) localhost:mapreduce Sean$ hadoop fs -chown hellokitty:hello /hello2019.sh localhost:mapreduce Sean$ hadoop fs -ls / Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/03/30 17:10:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 5 items -rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log -rwxrwxrwx 1 hellokitty hello 15 2019-03-30 16:55 /hello2019.sh drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user drwxr-xr-x - Sean supergroup 0 2019-03-30 16:43 /wordcount

copyFromlocal / copyToLocal
從本地拷貝; 拷貝到本地
cp
hdfs內部進行拷貝. hadoop fs -cp /hello2019.sh /a/hello2019.sh
mv
hdfs內部進行移動. hadoop fs -mv /hello2019.sh /a/
get
獲取到本地. 類似copyToLocal. hadoop fs -get /hello.sh
getmerge
合并下載多個文件. hadoop fs -getmerge /wordcount/output/* hellomerge.sh

localhost:mapreduce Sean$ hadoop fs -getmerge /wordcount/output/* hellomerge.sh localhost:mapreduce Sean$ cat hellomerge.sh 2019 1 able 1 cat 2 hello 1 kitty 1 pitty 2

put
下載到本地. 類似copyFromLocal. hadoop fs -put hello2019.sh /
rm
刪除. hadoop fs -rm -r /hello2019.sh

# -r recursive 遞歸的意思 localhost:mapreduce Sean$ hadoop fs -rm -r /1.log Deleted /1.log localhost:mapreduce Sean$ hadoop fs -ls / Found 4 items -rwxrwxrwx 1 hellokitty hello 15 2019-03-30 16:55 /hello2019.sh drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user drwxr-xr-x - Sean supergroup 0 2019-03-30 16:43 /wordcount

rmdir
刪除空目錄. hadoop fs - rmdir /abbc
df
統計文件系統的可用信息. hadoop fs -df -h /

localhost:mapreduce Sean$ hadoop fs -df -h / Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 Filesystem Size Used Available Use% hdfs://localhost:9000 465.7 G 5.9 M 169.1 G 0%

du
統計文件夾目錄的大小. hadoop fs -du -s -h /abc/d

# -s 匯總 -h 帶單位 localhost:mapreduce Sean$ hadoop fs -du -s -h /wordcount/ 86 /wordcountMost commands print help when invoked w/o parameters. localhost:mapreduce Sean$ hadoop fs -du -s -h hdfs://localhost:9000/* 15 hdfs://localhost:9000/hello2019.sh 4.7 M hdfs://localhost:9000/tmp 266.0 K hdfs://localhost:9000/user 86 hdfs://localhost:9000/wordcount

count
統計一個目錄下的文件數目. hadoop fs -count /aaa/
setrep
設置文件的副本數目. replication.

localhost:mapreduce Sean$ hadoop fs -setrep 3 /wordcount/input/hello2019.sh Replication 3 set: /wordcount/input/hello2019.sh

如果結點為3個, 但是設置為10個的時候.并不會設置10個, 這個是namenode中的原數據的副本數目,但是不一定是真實的副本數目(視datanode的數目而定).

WordCount

localhost:mapreduce Sean$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input/ /wordcount/outputlocalhost:mapreduce Sean$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input/ /wordcount/output Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/03/30 16:43:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/03/30 16:43:31 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 19/03/30 16:43:32 INFO input.FileInputFormat: Total input paths to process : 1 19/03/30 16:43:32 INFO mapreduce.JobSubmitter: number of splits:1 19/03/30 16:43:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553933297569_0001 19/03/30 16:43:33 INFO impl.YarnClientImpl: Submitted application application_1553933297569_0001 19/03/30 16:43:33 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1553933297569_0001/ 19/03/30 16:43:33 INFO mapreduce.Job: Running job: job_1553933297569_0001 19/03/30 16:43:43 INFO mapreduce.Job: Job job_1553933297569_0001 running in uber mode : false 19/03/30 16:43:43 INFO mapreduce.Job: map 0% reduce 0% 19/03/30 16:43:48 INFO mapreduce.Job: map 100% reduce 0% 19/03/30 16:43:54 INFO mapreduce.Job: map 100% reduce 100% 19/03/30 16:43:54 INFO mapreduce.Job: Job job_1553933297569_0001 completed successfully 19/03/30 16:43:54 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=74FILE: Number of bytes written=243693FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=157HDFS: Number of bytes written=44HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3271Total time spent by all reduces in occupied slots (ms)=3441Total time spent by all map tasks (ms)=3271Total time spent by all reduce tasks (ms)=3441Total vcore-milliseconds taken by all map tasks=3271Total vcore-milliseconds taken by all reduce tasks=3441Total megabyte-milliseconds taken by all map tasks=3349504Total megabyte-milliseconds taken by all reduce tasks=3523584Map-Reduce FrameworkMap input records=7Map output records=8Map output bytes=74Map output materialized bytes=74Input split bytes=115Combine input records=8Combine output records=6Reduce input groups=6Reduce shuffle bytes=74Reduce input records=6Reduce output records=6Spilled Records=12Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=123CPU time spent (ms)=0Physical memory (bytes) snapshot=0Virtual memory (bytes) snapshot=0Total committed heap usage (bytes)=306184192Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=42File Output Format CountersBytes Written=44

輸出目錄不能有其他內容, 否則會被進行覆蓋.本次測試后的輸出目錄下出現2個文件:success / part-r-0000. 第一個文件為輸出標準, 第二個文件是真實的文件.

localhost:mapreduce Sean$ hadoop fs -cat /wordcount/output/part-r-00000 2019 1 able 1 cat 2 hello 1 kitty 1 pitty 2

localhost:mapreduce Sean$ ls hadoop-mapreduce-client-app-2.7.5.jar hadoop-mapreduce-client-hs-plugins-2.7.5.jar hadoop-mapreduce-examples-2.7.5.jar hadoop-mapreduce-client-common-2.7.5.jar hadoop-mapreduce-client-jobclient-2.7.5-tests.jar lib hadoop-mapreduce-client-core-2.7.5.jar hadoop-mapreduce-client-jobclient-2.7.5.jar lib-examples hadoop-mapreduce-client-hs-2.7.5.jar hadoop-mapreduce-client-shuffle-2.7.5.jar sources

這個文件夾下方有非常多的測試例子. 可以自己研究.

基本原理(大體)

存儲在HDFS內的文件其實還是存儲在本地的, 只是它是一個分布式的文件系統而已. 我們看下我們之前存儲的.

注意,由于我本地的是DataNode與NameNode安裝在一起的, 所以文件目錄下結點如下所示:

localhost:tmp Sean$ tree . ├── dfs │ ├── data │ │ ├── current │ │ │ ├── BP-586017156-127.0.0.1-1553485799471 │ │ │ │ ├── current │ │ │ │ │ ├── VERSION │ │ │ │ │ ├── dfsUsed │ │ │ │ │ ├── finalized │ │ │ │ │ │ └── subdir0 │ │ │ │ │ │ └── subdir0 │ │ │ │ │ │ ├── blk_1073741825 │ │ │ │ │ │ ├── blk_1073741825_1001.meta │ │ │ │ │ └── rbw │ │ │ │ ├── scanner.cursor │ │ │ │ └── tmp │ │ │ └── VERSION │ │ └── in_use.lock │ ├── name │ │ ├── current │ │ │ ├── VERSION │ │ │ ├── edits_0000000000000000001-0000000000000000118 │ │ │ ├── edits_inprogress_0000000000000001233 │ │ │ ├── fsimage_0000000000000001230 │ │ │ ├── fsimage_0000000000000001230.md5 │ │ │ ├── fsimage_0000000000000001232 │ │ │ ├── fsimage_0000000000000001232.md5 │ │ │ └── seen_txid │ │ └── in_use.lock │ └── namesecondary │ ├── current │ │ ├── VERSION │ │ ├── edits_0000000000000000001-0000000000000000118 │ │ ├── edits_0000000000000000119-0000000000000000943 │ │ ├── edits_0000000000000001231-0000000000000001232 │ │ ├── fsimage_0000000000000001230 │ │ ├── fsimage_0000000000000001230.md5 │ │ ├── fsimage_0000000000000001232 │ │ └── fsimage_0000000000000001232.md5 │ └── in_use.lock └── nm-local-dir├── filecache├── nmPrivate└── usercache

NameNode: 文件的管理
DataNode: 文件的切分與管理(Socket / Netty)

HDFS命令

hdfs dfsadmin -report 查看集群狀態

localhost:Desktop Sean$ hdfs dfsadmin -report Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 19/04/03 15:51:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 500068036608 (465.72 GB) Present Capacity: 182055092460 (169.55 GB) DFS Remaining: 182048903168 (169.55 GB) DFS Used: 6189292 (5.90 MB) DFS Used%: 0.00% Under replicated blocks: 27 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0------------------------------------------------- Live datanodes (1):Name: 127.0.0.1:50010 (localhost) Hostname: localhost Decommission Status : Normal Configured Capacity: 500068036608 (465.72 GB) DFS Used: 6189292 (5.90 MB) Non DFS Used: 313001643796 (291.51 GB) DFS Remaining: 182048903168 (169.55 GB) DFS Used%: 0.00% DFS Remaining%: 36.40% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Wed Apr 03 15:51:21 CST 2019

Reference

[1].Hadoop Shell命令
[2]. 介紹hadoop中的hadoop和hdfs命令
[3]. Hadoop學習筆記4之HDFS常用命令

總結

以上是生活随笔為你收集整理的Hadoop Shell 命令与 WordCount的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： MAX40026 280ps高速比较器开
下一篇： java 内存溢出扩大jvm内存