gprof, Valgrind and gperftools - an evaluation of some tools for application level CPU profiling on

2019獨角獸企業重金招聘Python工程師標準>>> hot3.png

In this post I give an overview of my evaluation of three different CPU profiling tools:?gperftools,?Valgrind?and?gprof. I evaluated the three tools on usage, functionality, accuracy and runtime overhead.

The usage of the different profilers is demonstrated with the small demo program?cpuload, available via my github repository?gklingler/cpuProfilingDemo. The intent of?cpuload.cpp?is just to generate some?CPU load - it does nothing useful. The bash scripts in the same repo (which are also listed below) show how to compile/link the cpuload.cpp appropriately and execute the resulting executable to get the CPU profiling data.

gprof

The GNU profiler?gprof?uses a hybrid approach of?compiler assisted instrumentation?and?sampling. Instrumentation is used to gather function call information (e.g. to be able to generate call graphs and count the number of function calls). To gather profiling information at runtime, a sampling process is used. This means, that the program counter is probed at regular intervals by interrupting the program with operating system interrupts. As sampling is a statistical process, the resulting profiling data is not exact but are rather a statistical approximation?gprof statistical inaccuracy.

Creating a CPU profile of your application with?gprof?requires the following steps:

  1. compile and link the program with a compatible compiler and profiling enabled (e.g.?gcc -pg).
  2. execute your program to generate the profiling data file (default filename: gmon.out)
  3. run?gprof?to analyze the profiling data

Let’s apply this to our demo application:

#!/bin/bash# build the program with profiling support (-gp)
g++ -std=c++11 -pg cpuload.cpp -o cpuload# run the program; generates the profiling data file (gmon.out)
./cpuload# print the callgraph
gprof cpuload

The gprof output consists of two parts: the?flat profile?and the?call graph.

The flat profile reports the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

Gprof’s?call graph?is a textual call graph representation which shows the caller and callees of each function.

For detailed information on how to interpret the callgraph, take a look at the?official documentation. You can also generate a graphical representation of the callgraph with?gprof2dot?- a tool to generate a graphical representation of the gprof callgraph)).

The overhead (mainly caused by instrumentation) can be quite high: estimated to 30-260%1?2.

gprof does not support profiling multi-threaded applications and also cannot profile shared libraries. Even if there exist workarounds to get threading support3, the fact that it cannot profile calls into shared libraries, makes it totally unsuitable for today’s real-world projects.

valgrind/callgrind

Valgrind4?is an instrumentation framework for building dynamic analysis tools. Valgrind is basically a virtual machine with just in time recompilation of x86 machine code to some simpler RISC-like intermediate code:?UCode. It does not execute x86 machine code directly but it “simulates” the on the fly generated UCode. There are various Valgrind based?tools?for debugging and profiling purposes. Depending on the chosen tool, the UCode is instrumented appropriately to record the data of interest. For performance profiling, we are interested in the tool?callgrind: a profiling tool that records the function call history as a call-graph.

For analyzing the collected profiling data, there is is the amazing visualization tool?KCachegrind5. It represents the collected data in a very nice way what tremendously helps to get an overview about whats going on.

Creating a CPU profile of your application with?valgrind/callgrind?is really simple and requires the following steps:

  1. compile your program with debugging symbols enabled (to get a meaningful call-graph)
  2. execute your program with valgrind?--tool=callgrind ./yourprogram?to generate the profiling data file
  3. analyze your profiling data with e.g. KCachegrind

Let’s apply this our demo application (profile_valgrind.sh):

#!/bin/bash# build the program (no special flags are needed)
g++ -std=c++11 cpuload.cpp -o cpuload# run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind
valgrind --tool=callgrind ./cpuload# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

In contrast to gprof, we don’t need to rebuild our application with any special compile flags. We can execute any executable as it is with valgrind. Of course the executed program should contain debugging information to get an expressive call graph with human readable symbol names.

Below you see a KCachegrind with the profiling data of our?cpuload?demo:

analyzing cpu profiling data with KCachegrind

A downside of Valgrind is the enormous slowdown of the profiled application (around a factor of 50x) what makes it impracticable to use for larger/longer running applications. The profiling result itself is not influenced by the measurement.

gperftools

Gperftools?from Google provides a set of tools aimed for analyzing and improving performance of multi-threaded applications. They offer a CPU profiler, a fast thread aware malloc implementation, a?memory?leak?detector and a heap profiler. We focus on their?sampling based?CPU profiler.

Creating a CPU profile of selected parts of your application with?gperftools?requires the following steps:

  1. compile your program with debugging symbols enabled (to get a meaningful call graph) and link gperftools?profiler.so
  2. #include <gperftools/profiler.h>?and surround the sections you want to profile with?ProfilerStart("nameOfProfile.log");?and?ProfilerStop();
  3. execute your program to generate the profiling data file(s)
  4. To analyze the profiling data, use?pprof?(distributed with gperftools) or convert it to a callgrind compatible format and analyze it with KCachegrind

Let’s apply this our demo application (profile_gperftools.sh):

#!/bin/bash# build the program; For our demo program, we specify -DWITHGPERFTOOLS to enable the gperftools specific #ifdefs
g++ -std=c++11 -DWITHGPERFTOOLS -lprofiler -g ../cpuload.cpp -o cpuload# run the program; generates the profiling data file (profile.log in our example)
./cpuload# convert profile.log to callgrind compatible format
pprof --callgrind ./cpuload profile.log > profile.callgrind# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

Alternatively, profiling the whole application can be done without any changes or recompilation/linking, but I will not cover this here as this is not the recommended approach. But you can find more about this in the?docs.

The gperftools profiler can profile multi-threaded applications. The run time overhead while profiling is very low and the applications run at “native speed”. We can again use KCachegrind for analyzing the profiling data after converting it to a cachegrind compatible format. I also like the possibility to be able to selectively profile just certain areas of the code, and if you want to, you can easily extend your program to enable/disable profiling at runtime.

Conclusion and comparison

gprof?is the dinosaur among the evaluated profilers - its roots go back into the 1980’s. It seems it was widely used and a good solution during the past decades. But its limited support for multi-threaded applications, the inability to profile shared libraries and the need for recompilation with compatible compilers and special flags that produce a considerable runtime overhead, make it unsuitable for using it in today’s real-world projects.

Valgrind?delivers the most accurate results and is well suited for multi-threaded applications. It’s very easy to use and there is KCachegrind for visualization/analysis of the profiling data, but the slow execution of the application under test disqualifies it for larger, longer running applications.

The?gperftools?CPU profiler has a very little runtime overhead, provides some nice features like selectively profiling certain areas of interest and has no problem with multi-threaded applications. KCachegrind can be used to analyze the profiling data. Like all sampling based profilers, it suffers statistical inaccuracy and therefore the results are not as accurate as with Valgrind, but practically that’s usually not a big problem (you can always increase the sampling frequency if you need more accurate results). I’m using this profiler on a large code-base and from my personal experience I can definitely recommend using it.

I hope you liked this post and as always, if you have questions or any kind of feedback please leave a comment below.

  1. GNU gprof Profiler?↑
  2. Low-Overhead Call Path Profiling of Unmodified, Optimized Code for higher order object oriented programs, Yu Kai Hong, Department of Mathematics at National Taiwan University; July 19, 2008, ACM 1-59593-167/8/06/2005?↑
  3. workaround to use gprof with multithreaded applications?↑
  4. Valgrind?↑
  5. KCachegrind?↑

轉載于:https://my.oschina.net/wdyoschina/blog/1506757

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/394538.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/394538.shtml
英文地址,請注明出處:http://en.pswp.cn/news/394538.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

xp計算機屬性打不開,xp系統我的電腦右鍵屬性打不開怎么辦

在使用xp系統過程中,我們經常需要打開“我的電腦”右鍵屬性,查看系統信息以及進行虛擬內存、性能方面的設置,不過有深度技術ghost xp sp3純凈版用戶右鍵點擊我的電腦,發現右鍵菜單中的“屬性”打不開,出現這個問題通常是注冊表禁用了這個屬性,下面小編跟大家介紹xp系統我的電腦…

狀態機學習(二)解析INI文件

題目來自<系統程序員成長計劃> 作者:李先靜. 狀態變化如下 #include <string> #include <iostream> using namespace std;string s "[GRP]\n\ name def \n\ data 2016.11.29 \r\n\ ; this is a comment \r\n\ str this is a test \n\ [zhangshan]…

接口之用例編寫、驗證

一、用Excel編寫用例&#xff08;xlsx格式&#xff09; 截圖僅供參考&#xff0c;實際用例編寫需要根據實際情況來。 二、用例加載、驗證 1、數據的加載 import xlrd,xlwt #python操作excel主要用到xlrd和xlwt這兩個庫&#xff0c;即xlrd是讀excel&#xff0c;xlwt是寫excel的庫…

計算機二級word真題書娟,計算機二級word試題.docx

PAGEPAGE # / 80Word試題在考生文件夾下打開文檔 word.docx &#xff0c;按照要求完成下列操作并以該文件名( word.docx )保存文檔。某高校為了使學生更好地進行職場定位和職業準備&#xff0c;提高就業能力&#xff0c;該校學工處將于2013 年 4月 29 日(星期五) 19:30-21:30 在…

農場js_通過銷售農場商品來解釋Web API

農場jsby Kevin Kononenko凱文科諾年科(Kevin Kononenko) 通過銷售農場商品來解釋Web API (Web APIs explained by selling goods from your farm) If you have been to a farmer’s market or farm stand, then you can understand the concept of an application programmin…

python安裝pyqt4_windows下安裝PyQt4

第一步&#xff1a;確認自己電腦上的Python版本。然后下載對應的.whl文件下載第二步&#xff1a;https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyqt4上下載對應版本版本的.whl文件。例如cp-27-cp27m就代表是python2.7的版本。如果要下載python3.6且電腦是64位的則需要下載PyQt…

repcached配置與簡單測試

安裝libevent-devel進行configure。安裝在文件夾/usr/local/repcached下編譯安裝完畢啟動11211節點啟動11212節點編寫文件驗證復制&#xff08;分別向1、2節點存入數據&#xff0c;驗證復制&#xff09;ruby執行結果

為Activity設置特定權限才能啟動

1.在AndroidManifest文件中&#xff0c;聲明一個權限&#xff0c;并在activity中添加屬性 <!--聲明權限&#xff0c;權限名一般為包名permission類名 --><permission android:name"com.jikexueyuan.notepad.specpermission.permission.MyAty"/> <acti…

nashPay項目遇到的問題

瀏覽器提示錯誤代碼&#xff1a; Failed to load resource: net::ERR_CONNECTION_REFUSED 出現這個問題是core服務異常&#xff0c;重啟core服務可解決 layUi 下拉框賦值 var loadZhongduan function (data) { admin.req({ url: baseUrl "shoukuanZhongduan/getList&quo…

使用Express在Node.js中實現非常基本的路由

by Pau Pavn通過保羅帕文(PauPavn) 使用Express在Node.js中實現非常基本的路由 (Really, really basic routing in Node.js with Express) The goal of this story is to briefly explain how routing works in Express while building a simple — very simple — Node app.這…

計算機抄作通用模塊,通用命令行模塊的設計及實現

摘要&#xff1a;自從上個世紀八十年代以來,圖形用戶界面得到快速發展,計算機逐漸進入各類企業,家庭,其應用得到廣泛的推廣.對比起命令行界面來說,圖形界面在交互性上有著不可比擬的優勢.但在一些需要執行大量重復性工作的方面,例如在系統管理上,命令行界面提供的腳本功能,能夠…

python讀寫磁盤扇區數據_C++-如何直接讀取Windows磁盤扇區的數據?

1.通過CreateFile系列來完成讀寫扇區可以通過CreateFile打開磁盤邏輯分區&#xff0c;還要通過SetFilePointer以文件操作的方式把指針移到要操作的磁盤扇區開始處&#xff0c;在定位到要訪問的扇區開始位置后就可以通過ReadFile或WriteFile函數實施相應的讀寫訪問了&#xff0c…

公司 郵件 翻譯 培訓 長難句 結課

今天結課啦。。。。。。 明天培訓總結&#xff0c;講翻譯技巧總結。 1new forms of thoughts as well as new subjects for thought must arise in the future as they have in the past, giving rise to new standards of elegance. 2if the small hot spots look as expected…

元祖(轉載)

一.基本數據類型  整數&#xff1a;int  字符串&#xff1a;str(注&#xff1a;\t等于一個tab鍵)  布爾值&#xff1a; bool  列表&#xff1a;list   列表用[]  元祖&#xff1a;tuple  元祖用&#xff08;&#xff09;  字典&#xff1a;dict注&#xff1a;所…

leetcood學習筆記-226- 翻轉二叉樹

題目描述&#xff1a; 第一次提交&#xff1a; class Solution(object):def invertTree(self, root):""":type root: TreeNode:rtype: TreeNode"""if not root:return Nonetemp root.leftroot.left root.rightroot.right temp# root.left,…

現代JavaScript中的精美圖案:制冰廠

I’ve been working with JavaScript on and off since the late nineties. I didn’t really like it at first, but after the introduction of ES2015 (aka ES6), I began to appreciate JavaScript as an outstanding, dynamic programming language with enormous, expres…

惠普omen測試軟件,雙GTX1080奢華魔方PC 惠普OMEN X評測

惠普最近一段時間在游戲PC領域著力發力&#xff0c;桌面的暗影精靈家族熱賣&#xff0c;如火如荼的勢頭終于傳導到了臺式機領域。而今&#xff0c;惠普也終于有了自己正統意義上的重型武器——桌面游戲臺式機OMEN 900暗影精靈II 系列。今天我們就要為大家評測這款三萬元的臺式機…

python 清華鏡像_Anaconda3清華鏡像 V5.3.1 最新免費版

相關軟件軟件大小版本說明下載地址Anaconda3清華鏡像是一款功能強大的python管理工具&#xff0c;此軟件集成了Conda和Python等大量科學計算分析的包&#xff0c;可以幫助用戶快速實現項目環境的配置&#xff0c;有需要的趕快來試試吧&#xff01;【功能特點】1、省時省心&…

Qt第五課 無構造函數可以接受源類型,或構造函數重載決策不明確

場景QJsonArray rgParams { 10, 20, 30, 40 };編譯代碼的時候出錯&#xff0c;C11標準才支持這種類的初始化列表語法&#xff0c;因此如果當前VS的版本過低&#xff0c;必須調整已有的代碼&#xff0c;例子如下&#xff1a;QJsonArray rgParams;rgParams.insert(0, 10);rgPar…

二. linux基礎命令

linux的基本命令一般有100多個&#xff0c;多練就可以了&#xff1b; 如果登陸用戶是root&#xff0c;那么是#&#xff1b;如果是其他用戶&#xff0c;則顯示的是$ 練習&#xff1a;基本命令 1.創建一個目錄/data mkdir /data ls -ld /data 2.在/data下面創建一個文件oldboy.tx…