Apache Hadoop文件上傳、下載、分布式計算案例初體驗

上篇：Apache Hadoop完全分布式集群搭建無坑指南-CSDN博客

通過上篇，我們搭建了完整的Hadoop集群，此篇我們簡單通過集群上傳和下載文件，同時測試分布式worldCount案例。后續的篇章再對分布式計算、分布式存儲作更深的理解。

上傳下載測試

從linux本地文件系統上傳下載文件驗證HDFS集群工作是否正常

#創建目錄
hdfs dfs -mkdir -p /test/input#本地hoome目錄創建一個文件,隨便寫點內容進去
cd /root
vim test.txt
?
#上傳linxu文件到Hdfs
hdfs dfs -put /root/test.txt /test/input
?
#從Hdfs下載文件到linux本地（可以換別的節點進行測試）
hdfs dfs -get /test/input/test.txt

分布式計算測試

在HDFS文件系統根目錄下面創建一個wcinput文件夾

[root@hadoop01 hadoop-2.9.2]# hdfs dfs -mkdir /wcinput

創建wc.txt文件，輸入如下內容

hadoop mapreduce yarn
hdfs hadoop mapreduce
mapreduce yarn kmning
kmning
kmning

上傳wc.txt到Hdfs目錄/wcinput下

hdfs dfs -put wc.txt /wcinput

執行mapreduce任務

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput/ /wcoutput

打印如下

24/07/03 20:44:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop03/192.168.43.103:8032
24/07/03 20:44:28 INFO input.FileInputFormat: Total input files to process : 1
24/07/03 20:44:28 INFO mapreduce.JobSubmitter: number of splits:1
24/07/03 20:44:28 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
24/07/03 20:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1720006717389_0001
24/07/03 20:44:29 INFO impl.YarnClientImpl: Submitted application application_1720006717389_0001
24/07/03 20:44:29 INFO mapreduce.Job: The url to track the job: http://hadoop03:8088/proxy/application_1720006717389_0001/
24/07/03 20:44:29 INFO mapreduce.Job: Running job: job_1720006717389_0001
24/07/03 20:44:45 INFO mapreduce.Job: Job job_1720006717389_0001 running in uber mode : false
24/07/03 20:44:45 INFO mapreduce.Job:  map 0% reduce 0%
24/07/03 20:44:57 INFO mapreduce.Job:  map 100% reduce 0%
24/07/03 20:45:13 INFO mapreduce.Job:  map 100% reduce 100%
24/07/03 20:45:14 INFO mapreduce.Job: Job job_1720006717389_0001 completed successfully
24/07/03 20:45:14 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=70FILE: Number of bytes written=396911FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=180HDFS: Number of bytes written=44HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=9440Total time spent by all reduces in occupied slots (ms)=11870Total time spent by all map tasks (ms)=9440Total time spent by all reduce tasks (ms)=11870Total vcore-milliseconds taken by all map tasks=9440Total vcore-milliseconds taken by all reduce tasks=11870Total megabyte-milliseconds taken by all map tasks=9666560Total megabyte-milliseconds taken by all reduce tasks=12154880Map-Reduce FrameworkMap input records=5Map output records=11Map output bytes=124Map output materialized bytes=70Input split bytes=100Combine input records=11Combine output records=5Reduce input groups=5Reduce shuffle bytes=70Reduce input records=5Reduce output records=5Spilled Records=10Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=498CPU time spent (ms)=3050Physical memory (bytes) snapshot=374968320Virtual memory (bytes) snapshot=4262629376Total committed heap usage (bytes)=219676672Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=80File Output Format CountersBytes Written=44

查看結果

[root@hadoop01 hadoop-2.9.2]# hdfs dfs -cat /wcoutput/part-r-00000
hadoop  2
hdfs ?  1
kmning  3
mapreduce ? ? ? 3
yarn ?  2

可見，程序將單詞出現的次數通過MapReduce分布式計算統計了出來。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/43102.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/43102.shtml
英文地址，請注明出處：http://en.pswp.cn/web/43102.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！