Using KernelShark to analyze the real-time scheduler【轉】

轉自:https://lwn.net/Articles/425583/

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

?

February 2, 2011

This article was contributed by Steven Rostedt

The last LWN article on Ftrace described trace-cmd, which is a front end tool to interface with Ftrace. trace-cmd is all command line based, and works well for embedded devices or tracing on a remote system. But reading the output in a text format may be a bit overwhelming, and make it hard to see the bigger picture. To be able to understand how processes interact, a GUI can help humans see what is happening at a global scale. KernelShark has been written to fulfill this requirement. It is a GUI front end to trace-cmd.

KernelShark is distributed in the same repository that trace-cmd is:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

To build it, you just need to type make?gui, as just typing make will only build trace-cmd. These two tools have been kept separate since a lot of embedded devices do not have the libraries needed to build KernelShark. A full HTML help is included in the repository and is installed with make install_doc. After installing the documentation, you can access the help directly from KernelShark from the "Help" menu.

This article is not a tutorial on using KernelShark, as everything you need to know about the tool is kept up-to-date in the KernelShark repository. Instead, this article will describe a use case that KernelShark was instrumental in helping to solve.

Analyzing the real-time scheduler

Some time ago, when the push/pull algorithm of the real-time scheduler in Linux was being developed, a decision had to be made about what to do when a high priority process wakes up on a CPU running a lower real-time priority process, where both processes have multiple CPU affinity, and both can be migrated to a CPU running a non-real-time task. One would think that the proper thing to do would be to simply wake up the high priority process on that CPU which would cause the lower priority process to be pushed off the running CPU. But a theory was that by doing so, we move a cache hot real-time process onto a cache cold CPU and possibly replace it with a cache cold process.

After some debate, the decision was made to migrate the high priority process to the CPU running the lowest priority task (or no task at all) and wake it there. Some time later, after the code was incorporated into mainline, I started to question this decision even though I was the one that fought for it. With the introduction of Ftrace, we now have a utility to truly examine the impact that decision has made.

The decision to move the higher priority task was based on an assumption that if the task was waking up, that it is more likely to be cache cold than a task that is already running. Thinking more about this case, one must think about what would cause a high priority task to wake up in the first place. If it is woken up periodically to do some work, then it can very well be the case that it will be cache cold. Any task that was scheduled in between can easily push out the cache of this high priority task. But what if the high priority task was blocked on a mutex? If the task was blocked on a mutex and another RT task was scheduled in its place then when the high priority task wakes up again, there is a good chance that the task will be cache hot.

A mutex in most real-time programs will usually be held for a short period of time. The PREEMPT_RT patch, which this code was developed from, converts spinlocks into mutexes, and those mutexes are held for very small code segments, as all spinlocks should be. Migrating a task simply because it blocked on a mutex increases the impact these locks have on the throughput. Why punish the high priority task even more because it blocked and had to wait for another task to run?

Before making any decision to change the code, I needed to have a test case that can show that the moving of a high priority task instead of preempting the lower priority task will cause the high priority task to ping pong around the CPUs when there is lock contention. A high priority task should not be punished (migrated) if it simply encounters lock contention with lower priority real-time tasks. It would also be helpful to know how changing this decision affects the total number of migrations for all the tasks under lock contention.

First try

Having a 4 processor box to play with, I started writing a test case that would possibly cause this scenario, and use Ftrace to analyze the result. The first test case to try was to create five threads (one more than CPUs) and four pthread mutex locks. Have all threads wake up from a barrier wait and then loop 50 times grabbing each lock in sequence and do a small busy loop. The name of this test is called migrate.c.

The test application uses trace_marker as explained in previous articles to write what is happening inside the application to synchronize with kernel tracing.

Running the following with trace-cmd:

   # trace-cmd record -e 'sched_wakeup*' -e sched_switch -e 'sched_migrate*' migrate# kernelshark

[KernelShark] Like trace-cmd report, KernelShark will, by default, read the file trace.dat. You can specify another file by naming it as the first argument to KernelShark. While the KernelShark display images may be difficult to read fully in the article, clicking any of them will bring up a full-resolution version.

Since all tasks have been recorded, even trace-cmd itself, we want to filter out any tasks that we do not care about. Selecting Filter->Tasks from the KernelShark menu, and then choosing only the migrate threads will remove the extraneous tasks. Note that events that involve two tasks, like sched_switch or sched_wakeup, will not be filtered out if one of the tasks should be displayed.

[KernelShark post-filtering]

In the default graph view, each on-line CPU is represented by a plot line. Each task is represented by a different color. The color is determined by running the process ID through a hash function and then parsing that number into a RGB format.

  • The purple (???) colored bar represents thread?4, the highest priority task.
  • The orange(ish) (???) colored bar represents thread?3.
  • The turquoise (???) colored bar represents thread?2.
  • The brown (???) colored bar represents thread?1.
  • The light blue (???) colored bar represents thread?0, the lowest priority task.

The lines sticking out of the top of the bars represent events that appear in the list below the graph.

By examining the graph we can see that the test case was quite naive. The lowest priority task, thread?0, never got to run until the other four tasks were finished. This makes sense as the machine only had four CPUs and there were four higher priority tasks running. The four running tasks were running in lock step, taking the locks in sequence. From this view it looks like the tasks went out of sequence, but if we zoom in to where the migrations happened, we see something different.

[KernelShark zoom]

To zoom into the graph, press and hold the left mouse button. A line will appear, then drag the mouse to the right. As the mouse moves off the line, another line will appear that follows the mouse. When you let go of the mouse button, the view will zoom in making the locations of the two lines the width of the new window.

Repeating the above procedure, we can get down to the details of the migration of thread?3. Double clicking on the graph brings the list view to the event that was clicked on. A green line appears at the location of that was clicked.

[KernelShark event selection]

On CPU?0, thread?3 was preempted by the watchdog/0 kernel thread. Because we filtered out all threads but the migration test tasks, we see a small blank on the CPU?0 line. This would have been filled in with a colored bar representing the watchdog/0 thread if the filters were not enabled. The watchdog/0 thread runs at priority 99, which we can see from the sched_switch event as the priority of the tasks is between the two colons. The priority shown is represented by the kernel's view of priority, which is inverse to what user-space uses (user space priority 99 is kernel priority zero).

When the watchdog/0 thread preempted thread?3, the push/pull algorithm of the scheduler pushed it off to CPU?3, which had the lowest priority running task. Zooming into the other migrations that happened on the other CPUs, show that the watchdog kernel thread was responsible for them as well. If it wasn't for the watchdog kernel threads, this test would not have had any migrations.

Test two, second failure

The first test took the naive approach of just setting up four locks and having the tasks grab them in order. But this just kept the tasks in sync. The next approach is to try to mix things up a little more. The concern about the real-time scheduler is how it affects the highest priority task. The next test creates the four locks again (as there are four CPUs) and five tasks each of increasing priority. This time, only the highest priority task grabs all the locks in sequence. The other four tasks will grab a single lock. Each lock will have a single task and the highest priority task grabbing that lock. To try to force contention, pthread barriers are used. For those unfamiliar with pthread barriers, they are synchronization methods to serialize threads. A barrier is first initialized with a number and all threads that hit the barrier will block until that number of threads have hit the barrier, then all the threads are released.

This test case creates two barriers for each lock (lock_wait and lock_go) each initialized with the number 2, for the two tasks (the unique low priority task and the high priority task) that will take the lock. The low priority task will take the lock and wait on a barrier (lock_wait). The high priority task will hit that barrier before it takes the corresponding lock. Because the low priority task is already waiting on the barrier, the high priority task will trigger the barrier to release both tasks because the barrier has a task limit of two. The high priority task will most likely try to take the mutex while the low priority task aleady has it. The low priority task will release the mutex and then wait on the other barrier (lock_go) letting the high priority task take the mutex.

Running this test under trace-cmd yields the following from KernelShark after filtering out all but the migrate test tasks.

As the colors of the tasks are determined by their process IDs, this run has the following: [KernelShark second try]

  • The initial green (???) bar is the main thread that is setting up all the locks and barriers.
  • The light purple (???) bar is the lowest priority thread?0.
  • The red (???) bar is the next-higher priority thread?1.
  • The yellow(ish) (???) bar is the next-higher priority thread?2.
  • The blue (???) bar is the next-higher priority thread?3.
  • The light turquoise (???) bar is the highest priority thread?4.

Looking at the graph it seems that the highest priority thread stayed on the same CPU, and was not affected by the contention. Considering that the scheduler is set to migrate a waking real-time task if it is woken on a CPU that is running [KernelShark zoom view] another real-time task, regardless of the priorities, one would think the high priority task would have migrated a bit more. Zooming in on the graph brings to light a bit more details to what is occurring.

What we can see from the graph, and from the list, is that the high priority thread did have contention on the lock. But because all threads are waiting for the high priority process to come around to its lock, the other threads are sleeping when the high priority process wakes up. The high priority process is only contending with a single thread at a time. Threads?0 and?2 share CPU?2 without issue, while threads?1 and?3 each still have a CPU for themselves.

The test to force migration

The second test was on the right track. It was able to produce a contention but failed to have the CPUs busy enough to cause the highest priority task to wake up on a CPU running another real-time task. What is needed is to have more tasks. The final test adds twice as many running threads as there are CPUs.

This test goes back to all tasks grabbing all locks in sequence. To prevent the synchronization that has happened before, each thread will hold a lock a different amount of time. The higher the priority of a thread, the shorter time it will hold the lock. Not only that, but the threads will now sleep after they release a lock. The higher the priority of a task, the longer it will sleep:

   lock_held  = 1 ms * ((nr_threads - thread_id) + 1)sleep_time = 1 ms * thread_id

The lowest priority thread will never sleep and it will hold the lock for the longest time. To make things even more interesting, the mutexes have been given the PTHREAD_PRIO_INHERIT attribute. When a higher priority thread blocks on a mutex held by a lower priority thread, the lower priority thread will inherit the priority of the thread it blocks.

The test records the number of times each task voluntarily schedules, the number of times it is preempted, the number of times it migrates, and the number of times it successfully acquired all locks. When the test finishes, it gives an output of these for each thread. The higher the task number the higher the priority of the thread it represents.

Taskvolnonvolmigratediterations
04330071571108
162113341247108
27777691072108
377517701108
478350699108
57882610109
680189680109
78130693110
Total540152687273868

[KernelShark success] Running this test under trace-cmd and viewing it with KernelShark yields a graph with lots of pretty colors, which means we likely succeeded in our goal. To prove that the highest priority thread did indeed migrate, we can plot the thread itself.

Using the "Plots" menu and choosing "Tasks" brings up the same type of dialog as the task filter that was described earlier. I selected the highest priority thread (migrate-2158), and zoomed in to get a better view. The colors on a task plot are determined by the CPU number it was running on. When a task migrates, the colors of the plot changes.

[KernelShark task plot]

This test now demonstrates how a high priority task can migrate substantially when other RT tasks are running on the system. Changes to the real-time scheduler can now be tested. The commit changes the decision on which thread migrates when an real-time task wakes up on a CPU running another real-time task. The original way was to always move the task that is waking up if there is a CPU available that is running a task that is lower in priority than both tasks. Instead, the commit changes this to just wake up the real-time task on its CPU if it is higher priority than the real-time task that is currently running.

The migrate test now shows:

Taskvolnonvolmigratediterations
0:5229232268108
1:56915291457109
2:80119612194109
3:8087891274109
4:81061155109
5:8131057109
6:8273581110
7:82404110
total:550473087490873

The total number of migrations has stayed around the same (several runs will yield a fluctuation of a few hundred), but the number of migrations for the highest priority task has dropped substantially, as it will not migrate simply because it woke up on a CPU running another real-time task. Note, the reason that the highest priority task migrated at all was because it woke up on a CPU that was running the task that owned the mutex that it was blocked on. As these are priority inheritance mutexes, the owner would have the same priority as the highest priority process that it is blocking. The wake up will not preempt a real-time task of equal priority. Perhaps that can be the next change to the real-time scheduler: have the wake up be aware of priority mutexes.

[KernelShark after change]

The highest priority thread (migrate-21412) was woken on CPU?3, which was running thread?1 (migrate-21406) which is the task that thread?7 originally blocked on. CPU?2 happened to be running thread?0 (migrate-21405) which was the lowest priority thread running at the time. Note that the empty green box that is at the start of the task plot represents the time between when a task was woken and the time it actually was scheduled in.

Using KernelShark allowed me to analyze each of my tests to see if they were doing what I expected them to do. The final test was able to force a common scenario where a high priority process is woken on a CPU running another real-time task, and cause the decision to be made, whether to migrate the waking task or not. This test allowed me to see how the changes to that decision affected the results.

This article demonstrates a simple use case for KernelShark, but there are a lot more features that aren't explained here. To find out more, download KernelShark and try it out. It is still in beta and is constantly being worked on. Soon there will be plugins that will allow it to read other file formats and even change the way it displays the graph. All the code is available and under the GPL, so you can add your own features as well (hint hint).


(Log in to post comments)

?

Using KernelShark to analyze the real-time scheduler

Posted Feb 3, 2011 2:32 UTC (Thu) by mhelsley (guest, #11324) [Link]

Interesting. Looking at the second table of 8 tasks it's clear there are 4 CPUs. The 4 highest priority threads have much fewer than 1000 migrations -- 150 or less -- while the remaining threads have 1000 or more. The number of non-voluntary pre-emptions is similarly distributed (presumably because those are involved in the push/pull migrations).

?

This looks excellent

Posted Feb 3, 2011 15:03 UTC (Thu) by robert_s (subscriber, #42402) [Link]

I've been looking for something like kernelshark ever since I started doing realtime programming and I'm so glad to know it exists.

?

This looks excellent

Posted Feb 10, 2011 21:56 UTC (Thu) by oak (guest, #2786) [Link]

Did you have some problem with LTT:
http://en.wikipedia.org/wiki/LTTng
?

(It's UI has some usability issues, but otherwise LTT has been used for this kind of stuff on Linux for the past decade.)

?

?

RT bandwidth causing gaps in the test

Posted Feb 4, 2011 3:36 UTC (Fri) by nevets (subscriber, #11875) [Link]

An interesting note:

Those gaps you see in the last test run are caused by the RT bandwidth kicking in. If I perform the following:

  # echo -1 > /proc/sys/kernel/sched_rt_runtime_us

Those gaps disappear.

?

Using KernelShark to analyze the real-time scheduler

Posted Feb 5, 2011 11:58 UTC (Sat) by tardyp (subscriber, #58715) [Link]

Looks great!
I wish we could more easily put our efforts in common and put the best of pytimechart (great graphical display), and kernelshark (great textual display and filtering capabilities) into the same tool.

I guess the choice of toolkit and programming language does not help.

Pierre

?

Using KernelShark to analyze the real-time scheduler

Posted Feb 6, 2011 20:55 UTC (Sun) by nevets (subscriber, #11875) [Link]

I thought gtk and python work well together. There's even python plugins for kernelshark. The filter capability is separated out and would be easy to make into a library that python could read. As the parse-events library is already separate, and I'm working on getting perf to use it too.
I'm all for reuse of code, and would love to help in any effort to share capabilities between the tools.

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/255209.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/255209.shtml
英文地址,請注明出處:http://en.pswp.cn/news/255209.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

無縫滾動的算法

一早上的時間做了一個簡單的無縫滾動,遇到的問題特別的多,而且對無縫滾動的算法也不是特別的清楚。 無縫滾動效果的原理:就是幾個圖片 浮動成為一排;然后讓圖片滾動,正常情況下圖片滾完,就留下了后面的空白…

ACM題目————一筆畫問題

描述 zyc從小就比較喜歡玩一些小游戲&#xff0c;其中就包括畫一筆畫&#xff0c;他想請你幫他寫一個程序&#xff0c;判斷一個圖是否能夠用一筆畫下來。 規定&#xff0c;所有的邊都只能畫一次&#xff0c;不能重復畫。 輸入第一行只有一個正整數N(N<10)表示測試數據的組數…

HALCON示例程序color_fuses_lut_trans.hdev通過顏色對保險絲進行分類

HALCON示例程序color_fuses_lut_trans.hdev通過顏色對保險絲進行分類 示例程序源碼&#xff08;加注釋&#xff09; 關于顯示類函數解釋 dev_update_off ()定義變量并初始化&#xff0c;這些變量都是下邊識別要用到的 FuseColors : [‘Orange’,‘Red’,‘Blue’,‘Yellow’,…

上海電驅動

從行業前景上來說還可以&#xff0c;但這個公司不行&#xff0c;公司各種坑&#xff0c;從上到下各種腐敗&#xff0c;打醬油的人比較多&#xff0c;在薪資方面除了技術部稍好一點&#xff0c;其他部門我建議你最好別去了&#xff0c;整體上這個公司員工沒幸福感&#xff01;只…

1056. 組合數的和(15)

1056. 組合數的和(15) 時間限制400 ms內存限制65536 kB乙級練習題解目錄給定N個非0的個位數字&#xff0c;用其中任意2個數字都可以組合成1個2位的數字。要求所有可能組合出來的2位數字的和。例如給定2、5、8&#xff0c;則可以組合出&#xff1a;25、28、52、58、82、85&#…

3、時間和隨機數

一、時間 1.1 使用Calendar/[?kl?nd?]/類獲取時間 1.1.1 常用方法 (1)public static Calendar getInstance&#xff08;&#xff09;: 使用默認時區和語言環境獲取一個基于當前時間的Calendar對象。 (2)public int get(int field) 返回給定日歷字段表示的日歷部分的數字…

哥尼斯堡的“七橋問題” (歐拉回路,并查集)

哥尼斯堡的“七橋問題” (25分) 哥尼斯堡是位于普累格河上的一座城市&#xff0c;它包含兩個島嶼及連接它們的七座橋&#xff0c;如下圖所示。 可否走過這樣的七座橋&#xff0c;而且每橋只走過一次&#xff1f;瑞士數學家歐拉(Leonhard Euler&#xff0c;1707—1783)最終解決…

HALCON示例程序color_pieces.hdev通過MLP訓練器對彩色棋子進行分類識別

HALCON示例程序color_pieces.hdev通過MLP訓練器對彩色棋子進行分類識別&#xff1b;分別在彩色圖像下與灰度圖像下進行&#xff0c;從而產生對比。 示例程序源碼&#xff08;加注釋&#xff09; 關于顯示類函數解釋 dev_update_off () dev_close_window () dev_open_window (…

無人駕駛汽車之爭本田為何未戰先敗

摘要 : 本田汽車的研發部門對于汽車雖然理解深刻&#xff0c;但從整體而言&#xff0c;本田的造車理念還停留在上個時代&#xff0c;在未來的無人駕駛競爭中&#xff0c;本田已經有未戰先啊敗的苗頭。 百度百家The BIG Talk硅谷站連續5小時的高密度頭腦風暴&#xff0c;果然讓人…

理解git結構與簡單操作(四)合并分支的方法與策略

接上節&#xff0c;此時的dev分支與master分支的進度就不一樣了&#xff0c;所以需要將dev分支與master分支同步。這里需要的就是合并分支的操作&#xff0c;大家應該都知道用git merge或者git rebase。 git merge merge&#xff0c;即「合并」。 fast-forward 當出現我們上面圖…

HALCON示例程序color_segmentation_pizza.hdev披薩肉餅識別。

HALCON示例程序color_segmentation_pizza.hdev披薩肉餅識別。 示例程序源碼&#xff08;加注釋&#xff09; 關于顯示類函數解釋 dev_update_off () dev_close_window () read_image (Image, ‘color/pizza_01’) get_image_size (Image, Width, Height) dev_open_window (0,…

攝像機標定

利用攝像機所拍攝到的圖像來還原空間中的物體。在這里&#xff0c;不妨假設攝像機所拍攝到的圖像與三維空間中的物體之間存在以下一種簡單的線性關系&#xff1a;[像]M[物],這里&#xff0c;矩陣M可以看成是攝像機成像的幾何模型。 M中的參數就是攝像機參數。通常&#xff0c;這…

Linux下Tomcat重新啟動

在Linux系統下&#xff0c;重啟Tomcat使用命令操作的&#xff01; 首先&#xff0c;進入Tomcat下的bin目錄 cd /usr/local/tomcat/bin 使用Tomcat關閉命令 ./shutdown.sh 查看Tomcat是否以關閉 ps -ef|grep java 如果顯示以下相似信息&#xff0c;說明Tomcat還沒有關閉 root …

大數據和人工智能的關系是什么?

何為大數據&#xff1f;何為人工智能&#xff1f; 大數據&#xff0c;百度百科上是這么定義的&#xff0c;指無法在一定時間范圍內用常規軟件工具進行捕捉、管理和處理的數據集合&#xff0c;是需要新處理模式才能具有更強的決策力、洞察發現力和流程優化能力的海量、高增長率…

【2017-03-09】SQL Server 數據庫基礎、四種約束

一、數據庫和內存的區別 數據庫&#xff1a;一些存儲在硬盤上的數據文件 內存&#xff1a;計算機臨時存儲的一些數據 二、常用數據庫 .Net - SQL Server PHP - MySql Java - Oreacl 三、SQL Server使用方法 1、新建數據庫 右鍵點擊“數據庫”&#xff0c;點擊“新建數據庫”。在…

HALCON示例程序color_simple.hdev在HSV空間篩選黃色線

HALCON示例程序color_simple.hdev在HSV空間篩選黃色線 示例程序源碼&#xff08;加注釋&#xff09; 關于顯示類函數解釋 dev_close_window () dev_open_window (0, 0, 640, 480, ‘black’, WindowHandle) for i : 1 to 2 by 1 read_image (Image, ‘cable’ i) 將彩色圖片…

張正友標定法 【計算機視覺學習筆記--雙目視覺幾何框架系列】

三、致敬“張正友標定” 此處“張正友標定”又稱“張氏標定”&#xff0c;是指張正友教授于1998年提出的單平面棋盤格的攝像機標定方法。張氏標定法已經作為工具箱或封裝好的函數被廣泛應用。張氏標定的原文為“A Flexible New Technique forCamera Calibration”。此文中所提到…

SQL基礎三

關系數據庫操作語言 對關系數據庫進行操作標準語言是所謂的結構化查詢語言SQL&#xff0c;和其他程序語言不一樣的是&#xff0c;它是非過程語言。 SQL采用自然英語的結構&#xff0c;比較容易上手&#xff0c;目前SQL已經有了ANSI標準&#xff0c;哥哥數據庫廠商除了SQL語法外…

HTTP狀態碼詳解

HTTP狀態碼介紹 createTime--2016年9月24日09:41:48 參考鏈接&#xff1a;http://www.w3school.com.cn/tags/html_ref_httpmessages.asp概括&#xff1a;   1字開頭&#xff1a;消息。信息性狀態碼&#xff0c;代表請求已被接受&#xff0c;需要繼續處理。&#xff08;接受的…

HALCON示例程序connection.hdev分割連通域

HALCON示例程序connection.hdev分割連通域 示例程序源碼&#xff08;加注釋&#xff09; read_image (Image, ‘mreut’) 二值化 threshold (Image, Region, 190, 255)分割連通域 connection (Region, ConnectedRegions)使用面積進行篩選 select_shape (ConnectedRegions, S…