1.使用示例程序實現單詞統計
(1)wordcount程序
? ? wordcount程序在hadoop的share目錄下,如下:
1 2 3 4 5 6 7 8 9 | [root@leaf?mapreduce] #?pwd /usr/local/hadoop/share/hadoop/mapreduce [root@leaf?mapreduce] #?ls hadoop-mapreduce-client-app-2.6.5.jar?????????hadoop-mapreduce-client-jobclient-2.6.5-tests.jar hadoop-mapreduce-client-common-2.6.5.jar??????hadoop-mapreduce-client-shuffle-2.6.5.jar hadoop-mapreduce-client-core-2.6.5.jar????????hadoop-mapreduce-examples-2.6.5.jar hadoop-mapreduce-client-hs-2.6.5.jar??????????lib hadoop-mapreduce-client-hs-plugins-2.6.5.jar??lib-examples hadoop-mapreduce-client-jobclient-2.6.5.jar???sources |
????就是這個hadoop-mapreduce-examples-2.6.5.jar程序。
?
(2)創建HDFS數據目錄
????創建一個目錄,用于保存MapReduce任務的輸入文件:
1 | [root@leaf?~] #?hadoop?fs?-mkdir?-p?/data/wordcount |
????創建一個目錄,用于保存MapReduce任務的輸出文件:
1 | [root@leaf?~] #?hadoop?fs?-mkdir?/output |
????查看剛剛創建的兩個目錄:
1 2 3 | [root@leaf?~] #?hadoop?fs?-ls?/ drwxr-xr-x???-?root?supergroup??????????0?2017-09-01?20:34? /data drwxr-xr-x???-?root?supergroup??????????0?2017-09-01?20:35? /output |
(3)創建一個單詞文件,并上傳到HDFS
????創建的單詞文件如下:
1 2 3 4 5 6 | [root@leaf?~] #?cat?myword.txt? leaf?yyh yyh?xpleaf katy?ling yeyonghao?leaf xpleaf?katy |
????上傳該文件到HDFS中:
1 | [root@leaf?~] #?hadoop?fs?-put?myword.txt?/data/wordcount |
????在HDFS中查看剛剛上傳的文件及內容:
1 2 3 4 5 6 7 8 | [root@leaf?~] #?hadoop?fs?-ls?/data/wordcount -rw-r--r--???1?root?supergroup?????????57?2017-09-01?20:40? /data/wordcount/myword .txt [root@leaf?~] #?hadoop?fs?-cat?/data/wordcount/myword.txt leaf?yyh yyh?xpleaf katy?ling yeyonghao?leaf xpleaf?katy |
(4)運行wordcount程序
????執行如下命令:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | [root@leaf?~] #?hadoop?jar?/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar?wordcount?/data/wordcount?/output/wordcount ... 17 /09/01 ?20:48:14?INFO?mapreduce.Job:?Job?job_local1719603087_0001?completed?successfully 17 /09/01 ?20:48:14?INFO?mapreduce.Job:?Counters:?38 ???????? File?System?Counters ???????????????? FILE:?Number?of?bytes? read =585940 ???????????????? FILE:?Number?of?bytes?written=1099502 ???????????????? FILE:?Number?of? read ?operations=0 ???????????????? FILE:?Number?of?large? read ?operations=0 ???????????????? FILE:?Number?of?write?operations=0 ???????????????? HDFS:?Number?of?bytes? read =114 ???????????????? HDFS:?Number?of?bytes?written=48 ???????????????? HDFS:?Number?of? read ?operations=15 ???????????????? HDFS:?Number?of?large? read ?operations=0 ???????????????? HDFS:?Number?of?write?operations=4 ???????? Map-Reduce?Framework ???????????????? Map?input?records=5 ???????????????? Map?output?records=10 ???????????????? Map?output?bytes=97 ???????????????? Map?output?materialized?bytes=78 ???????????????? Input? split ?bytes=112 ???????????????? Combine?input?records=10 ???????????????? Combine?output?records=6 ???????????????? Reduce?input? groups =6 ???????????????? Reduce?shuffle?bytes=78 ???????????????? Reduce?input?records=6 ???????????????? Reduce?output?records=6 ???????????????? Spilled?Records=12 ???????????????? Shuffled?Maps?=1 ???????????????? Failed?Shuffles=0 ???????????????? Merged?Map?outputs=1 ???????????????? GC? time ?elapsed?(ms)=92 ???????????????? CPU? time ?spent?(ms)=0 ???????????????? Physical?memory?(bytes)?snapshot=0 ???????????????? Virtual?memory?(bytes)?snapshot=0 ???????????????? Total?committed?heap?usage?(bytes)=241049600 ???????? Shuffle?Errors ???????????????? BAD_ID=0 ???????????????? CONNECTION=0 ???????????????? IO_ERROR=0 ???????????????? WRONG_LENGTH=0 ???????????????? WRONG_MAP=0 ???????????????? WRONG_REDUCE=0 ???????? File?Input?Format?Counters? ???????????????? Bytes?Read=57 ???????? File?Output?Format?Counters? ???????????????? Bytes?Written=48 |
????
(5)查看統計結果
????如下:
1 2 3 4 5 6 7 | [root@leaf?~] #?hadoop?fs?-cat?/output/wordcount/part-r-00000 katy????2 leaf????2 ling????1 xpleaf??2 yeyonghao???????1 yyh?????2 |
本文轉自 xpleaf 51CTO博客,原文鏈接:http://blog.51cto.com/xpleaf/1962271,如需轉載請自行聯系原作者