【Hibench 】完成 HDP-Spark 性能測試

🍁 博主 "開著拖拉機回家"帶您 Go to New World.?🍁

🦄?個人主頁——🎐開著拖拉機回家_Linux,Java基礎學習,大數據運維-CSDN博客?🎐?🍁

🪁🍁 希望本文能夠給您帶來一定的幫助🌸文章粗淺，敬請批評指正！🍁🐥

🍁 博主 "開著拖拉機回家"帶您 Go to New World.?🍁

一、HiBench簡介

二、版本和依賴

三、下載和編譯

3.1 下載安裝包

3.2 HiBench編譯

3.3 Hibench目錄說明

四、修改配置文件

4.1 hibench.conf

4.2 hadoop.conf

4.3 spark.conf

五、運行測試

5.1 準備數據

5.2 運行測試

5.3 report結果查詢

六、遇到的問題

一、HiBench簡介

HiBench是Intel推出的一個大數據基準測試工具，可以幫助評估不同的大數據框架在速度、吞吐量和系統資源利用方面評估不同的大數據框架的性能表現。它包含一組Hadoop、Spark和流式WorkLoads，包括Sort、WordCount、TeraSort、Repartition、Sleep、SQL、PageRank、Nutch索引、Bayes、Kmeans、NWeight和增強型DFSIO等。它還包含幾個用于Spark Streaming、Flink、Storm和Gearpump的流式WorkLoads。

項目GitHub地址：GitHub - Intel-bigdata/HiBench: HiBench is a big data benchmark suite.

二、版本和依賴

軟件	版本
hadoop	2.10（官方要求Apache Hadoop 3.0.x, 3.1.x, 3.2.x, 2.x, CDH5, HDP）
maven	3.8.5
java	8
python	2.7.5

HDP 集群版本信息

Java 和Maven 環境配置

三、下載和編譯

3.1 下載安裝包

cd /opt
下載并解壓wget https://github.com/Intel-bigdata/HiBench/archive/v7.1.1.tar.gz
tar -zxvf v7.1.1.tar.gz
cd HiBench-7.1.1/

3.2 HiBench編譯

HiBench編譯支持如下幾種方式：

Build All
Build a specific framework benchmark
Build a single module
Build Structured Streaming

在進行Hibench的時候可以指定Spark和Scala的版本，通過如下參數指定

具體參考官網： https://github.com/Intel-bigdata/HiBench/blob/master/docs/build-hibench.md


# 執行全部編譯 編譯所有框架及模塊
./bin/build_all.sh

3.3 Hibench目錄說明

autogen：主要用于生成測試數據的源碼目錄
bin：測試腳本放置目錄
common：公共依賴源碼目錄
conf：配置文件目錄（Hibench/Hadoop/Spark等配置文件存放目錄）
docker：docker 方式部署
flinkbench:Flink框架源碼目錄
gearpumpbench：gearpumpbench框架源碼目錄
hadoopbench：hadoop框架源碼目錄
sparkbench：spark框架的源碼目錄
stormbench：storm框架的源碼目錄

四、修改配置文件

4.1 hibench.conf

hibench.conf 配置數據集大小和并行度

hibench.scale.profile                tiny
# Mapper number in hadoop, partition number in Spark
hibench.default.map.parallelism         8# Reducer nubmer in hadoop, shuffle partition number in Spark
hibench.default.shuffle.parallelism     8

hibench.scale.profile：主要配置HiBench測試的數據規模，可自定義配置；
hibench.default.map.parallelism：主要配置MapReduce的Mapper數量；
hibench.default.shuffle.parallelism：配置Reduce數量；

HiBench的默認數據規模有：tiny, small, large, huge, gigantic andbigdata，在這幾種數據規模之外還可以自己指定數據量。

4.2 hadoop.conf

hadoop.conf，配置hadoop集群的相關信息(如下為HDP集群配置)

cp   conf/hadoop.conf.template conf/hadoop.confvim conf/hadoop.conf
# Hadoop home
hibench.hadoop.home     /usr/hdp/3.1.4.0-315/hadoop# The path of hadoop executable
hibench.hadoop.executable     ${hibench.hadoop.home}/bin/hadoop# Hadoop configraution directory
hibench.hadoop.configure.dir  ${hibench.hadoop.home}/etc/hadoop# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://winner# Hadoop release provider. Supported value: apache, cdh5, hdp
hibench.hadoop.release    hdp

hibench.hdfs.master 可以在 core-site.xml中的 fs.defaultFS 找到，開啟了NameNode高可用。

4.3 spark.conf

spark.conf，配置hadoop集群的相關信息

cp   conf/spark.conf.template  conf/spark.conf
vim  conf/spark.conf# Spark home
hibench.spark.home      /usr/hdp/3.1.4.0-315/spark2

可自定義數據規模

conf/workloads/micro/terasort.conf
#datagen
hibench.terasort.tiny.datasize			32000
hibench.terasort.small.datasize			3200000
hibench.terasort.large.datasize			32000000
hibench.terasort.huge.datasize			320000000
hibench.terasort.gigantic.datasize		3200000000
hibench.terasort.bigdata.datasize		6000000000hibench.workload.datasize		${hibench.terasort.${hibench.scale.profile}.datasize}
## 增加自定義的數據量
#hibench.terasort.myscale.datasize 5242880
#hibench.workload.datasize               ${hibench.terasort.${hibench.scale.profile}.datasize}# export for shell script
hibench.workload.input			${hibench.hdfs.data.dir}/Terasort/Input
hibench.workload.output			${hibench.hdfs.data.dir}/Terasort/Output

在 hibench.conf 中設置 hibench.scale.profile 為 myscale ，默認為 tiny

五、運行測試

5.1 準備數據

HDP 集群開啟了 kerberos ，運行腳本使用了 kerberos 用戶。如下生成一個WordCount測試數據集。

bin/workloads/micro/wordcount/prepare/prepare.sh

5.2 運行測試

將WordCount基準測試數據集生成后，就可以執行基準測試了，對于WordCount基準測試選擇了Spark 運行以下命令即可：

bin/workloads/micro/terasort/spark/run.sh

通過HDFS可以看到/HiBench目錄下生成的各個用例生成的測試數據及用例結果

YARN 可以到任務 ScalaWordCount

5.3 report結果查詢

[root@hdp105 HiBench-7.1.1]# cat    report/hibench.report 
Type         Date       Time     Input_data_size      Duration(s)          Throughput(bytes/s)  Throughput/node     
ScalaSparkTerasort 2023-08-16 20:07:22 3200000              46.503               68812                17203               
ScalaSparkTerasort 2023-08-16 20:09:26 3200000              38.856               82355                20588               
ScalaSparkWordcount 2023-08-17 13:29:46 37181                66.082               562                  140

ScalaSparkWordcount 數據大小37181 ，運行時間66.082 ·。每個用例的測試數據量、運行耗時及吞吐量。如下是生成的日志和統計的指標文件：

即將 wordCount 使用Spark 運行后的 monitor.html 下載到本地拖到瀏覽器

 /opt/HiBench-7.1.1/report/wordcount/spark/monitor.html

圖表展示如下：

Summarized Network throughputs & Packer-per-sedonds

Summarized Memory usage

Summarized Disk throughput & IOPS

六、遇到的問題

build 的時候遇到了插件下載不了的問題，問題如下：

[INFO] mahout 7.1.1 ....................................... FAILURE [  7.767 s]
[INFO] PEGASUS: A Peta-Scale Graph Mining System 2.0-SNAPSHOT SKIPPED
[INFO] nutchindexing 7.1.1 ................................ SKIPPED
[INFO] stormbench 7.1.1 ................................... SKIPPED
[INFO] stormbench-streaming 7.1.1 ......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:07 min
[INFO] Finished at: 2023-08-17T18:56:25+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.googlecode.maven-download-plugin:download-maven-plugin:1.2.0:wget (extra-download-execution) on project mahout: IO Error: Could not get content -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :mahout

報錯截圖如下：

修改pom文件

hadoopbench/mahout/pom.xml

解決方式： 就是把插件下載build 部分刪除，我不用你就行了, 無非構建慢點。

參考鏈接：HiBench 7.x 使用問題整理

HiBench大數據基準測試使用 - 知乎

如何使用HiBench進行基準測試_51CTO博客_基準測試

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/43194.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/43194.shtml
英文地址，請注明出處：http://en.pswp.cn/news/43194.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！