變量的作用域和生存期:
In the previous article, I have described the Kaplan-Meier estimator. To give a quick recap, it is a non-parametric method to approximating the true survival function. This time, I will focus on another approach to visualizing a survival dataset — using the hazard function and the Nelson-Aalen estimator. Once again, we will use the convenience of the lifetimes
library to quickly create the plots in Python.
在上一篇文章中 ,我描述了Kaplan-Meier估計器。 快速回顧一下,這是一種逼近真實生存函數的非參數方法。 這次,我將重點介紹另一種可視化生存數據集的方法-使用危害函數和Nelson-Aalen估計量。 再次,我們將使用lifetimes
庫的便利性在Python中快速創建繪圖。
1. Nelson-Aalen估計量 (1. The Nelson-Aalen estimator)
With the Kaplan-Meier curves, we approximated the survival function, stating the probability of the event of interest (for example, the death event) not occurring by a certain time t.
使用Kaplan-Meier曲線,我們近似了生存函數,指出了在特定時間t內未發生關注事件(例如,死亡事件)的概率。
An alternative approach to visualizing the aggregate information from a survival-focused dataset entails using the hazard function, which can be interpreted as the probability of the subject experiencing the event of interest within a small interval of time, assuming that the subject has survived up until the beginning of the said interval. For a more detailed description of the hazard function, please see this article.
可視化以生存為重點的數據集中的匯總信息的另一種方法是使用危險函數 ,該函數可以解釋為受試者在短時間內間隔內經歷感興趣事件的概率,前提是受試者可以存活到所說間隔的開始。 有關危害功能的更詳細說明,請參見本文 。
Unfortunately, we cannot transform the Kaplan-Meier estimate of the survival function to the hazard function. However, we can use another non-parametric estimator of the cumulative hazard function — the Nelson-Aalen estimator. In short, it is used to estimate the cumulative number of expected events within a certain period of time. And the reason why it is cumulative is that the sum of estimates is much more stable than the point-wise estimates.
不幸的是,我們無法將生存函數的Kaplan-Meier估計轉換為危險函數。 但是,我們可以使用累積危害函數的另一個非參數估計量-Nelson-Aalen估計量。 簡而言之,它用于估計特定時間段內預期事件的累積數量。 而且之所以具有累加性,是因為估算總和比逐點估算穩定得多。
The Nelson-Aalen estimator can be calculated as:
Nelson-Aalen估計量可以計算為:

where d_i stands for the number of events of interest at time t, while n_i is the number of observations at risk. All these terms are naturally similar to the ones in the Kaplan-Meier estimator’s formula.
其中d_i代表在時間t處感興趣事件的數量,而n_i是處于風險中的觀測值的數量。 所有這些術語自然都與Kaplan-Meier估計器的公式相似。
The Nelson-Aalen estimator, or more generally visualizing the hazard function over time, is not a very popular approach to survival analysis. That is because — in comparison to the survival function — explanation of the curves is not so simple and intuitive. However, the hazard function is of great importance for more advanced approaches to survival analysis, for example, the Cox regression. That is why it is important to understand the concept and I will try to provide some insights about it. We can say that the cumulative hazard function:
Nelson-Aalen估計器,或更一般地說是隨著時間的推移可視化危害函數,并不是一種很流行的生存分析方法。 這是因為與生存函數相比,曲線的解釋不是那么簡單直觀。 但是,危害函數對于更高級的生存分析方法(例如Cox回歸)非常重要。 這就是為什么理解這個概念很重要的原因,我將嘗試提供一些有關它的見解。 我們可以說累積危害函數為:
measures the total amount of risk that has been accumulated up to a certain point of time t.
衡量到某個時間點t之前已累計的風險總量。
provides the number of times we would mathematically expect the occurrence of the event of interest over a certain period if only the events were repeatable. This might be a bit confusing, so to make the statement a bit simpler (yet not that realistic) you can think of the cumulative hazard function as the expected number of deaths of an individual up to time t, if the individual could to be resurrected after each death without resetting the time. As I said, not that realistic, but this could be just as well applied to machine failures, etc.
提供了如果只有事件是可重復的,在數學上我們期望在特定時期內發生感興趣事件的次數。 這可能有點令人困惑,因此,為了使該語句更簡單(但不那么現實),您可以將累積危險函數視為直到t時一個人的預期死亡人數(如果該人可以復活)。每次死亡后無需重新設置時間。 就像我說的那樣,這不現實,但這可以很好地應用于機器故障等。
The last concept that might be useful to get some intuition about the cumulative hazard function is the bathtub curve, or rather its components. The curve represents the life cycles of many electronic consumer products. The hazard rate of the bathtub curve is created by combining:
獲得對累積危害函數的一些直覺可能有用的最后一個概念是浴盆曲線 ,或者更確切地說是其組成部分。 該曲線表示許多電子消費產品的生命周期。 通過組合以下內容來創建浴缸曲線的危險率:
- the rate of early “infant mortality” failures when the product is first introduced, 首次引入該產品時早期“嬰兒死亡率”失敗的比率,
- the rate of random failures with constant failure rate during the product’s designed life, 在產品的設計壽命中具有恒定故障率的隨機故障率,
- the rate of “wear out” failures as the product exceeds its expected lifetime. 產品超過其預期使用壽命時的“磨損”故障率。

While the image above represents the hazard rate (not the cumulative one!), the shape of the Nelson–Aalen estimator’s curve gives us an idea of how the hazard rate changes over time.
雖然上面的圖像代表了危險率(而不是累計危險率!),但尼爾森-阿倫估計量曲線的形狀使我們了解了危險率如何隨時間變化。
For example, the concave shape of the cumulative hazard function indicates that we are dealing with an “infant mortality” kind of event (dotted red line in the image), where the rate of failure is highest early on and decreases with time. On the other hand, the convex shape of the cumulative hazard function implies we are dealing with the “wear out” kind of event (dotted yellow line).
例如,累積危害函數的凹形表示我們正在處理“嬰兒死亡”類事件(圖像中的紅色虛線),該事件的發生率在早期就最高,隨著時間的推移而降低。 另一方面,累積危害函數的凸形意味著我們正在處理“磨損”事件(黃色虛線)。
I believe this is enough theory to understand the Nelson-Aalen estimator of the cumulative hazard function. It is time to code!
我相信這足以理解累積危險函數的Nelson-Aalen估計量。 是時候編寫代碼了!
2. Python示例 (2. Example in Python)
For consistency, we continue the example using the popular Telco Customer Churn dataset started in the previous article. For brevity’s sake, please refer to that article for a description of the dataset and the reasoning for the transformations applied to it. First, we load the required libraries.
為了保持一致性,我們使用上一篇文章中開始的流行的Telco客戶流失數據集繼續該示例。 為簡便起見,請參考該文章以獲取有關數據集的說明以及對其應用轉換的原因。 首先,我們加載所需的庫。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsfrom lifelines import NelsonAalenFitterplt.style.use('seaborn')
Then, we load the data:
然后,我們加載數據:
df = pd.read_csv('../data/telco_customer_churn.csv')
df['churn'] = [1 if x == 'Yes' else 0 for x in df['Churn']]
lifelines
makes the process of calculating and plotting the Nelson-Aalen estimator very simple, we just need to run the following few lines of code to plot the cumulative hazard function.
lifelines
使計算和繪制Nelson-Aalen估計量的過程非常簡單,我們只需要運行以下幾行代碼來繪制累積危害函數。
T = df['tenure']
E = df['churn']naf = NelsonAalenFitter()
naf.fit(T, event_observed=E)naf.plot(at_risk_counts=True)
plt.title('Cumulative hazard function');
The code generates the following plot:
該代碼生成以下圖:

I would say that the shape of the cumulative hazard function based on the Nelson-Aalen’s estimator might indicate that we are dealing with the hazard function similar to the bathtub curve. The reason for this is that we see that both in the beginning and close to the end the rate of change is higher, while it more or less flattens out (stabilizes at a constant level) in the middle of the customers’ lifetime with the company.
我想說,基于Nelson-Aalen估計量的累積危害函數的形狀可能表明我們正在處理類似于浴盆曲線的危害函數。 這樣做的原因是,我們看到在開始和結束時變化率都較高,而在公司生命周期中,變化率或多或少趨于平穩(穩定在一個穩定的水平) 。
We can also easily access the cumulative hazard function by using the cumulative_hazard_
method of the fitted NelsonAalenFitter
object.
我們還可以很容易地通過使用訪問累積危險功能cumulative_hazard_
擬合的方法NelsonAalenFitter
對象。
One more interesting functionality provided by the library is the events table, which summarizes what happened at each point of time. We can obtain it by running naf.event_table
and the result looks as follows:
該庫提供的另一個有趣的功能是事件表,該表匯總了每個時間點發生的情況。 我們可以通過運行naf.event_table
獲得它,結果如下所示:

Similarly to the Kaplan-Meier case, we will also plot the cumulative hazard function for each variant of the payment method. As lifelines
provides a uniformed way of working with different tools used for survival analysis, the code requires only a few slight modifications.
與Kaplan-Meier情況類似,我們還將繪制每種付款方式的累積危害函數。 由于lifelines
提供了與用于生存分析的不同工具一起使用的統一方式,因此該代碼僅需進行少量修改。
ax = plt.subplot(111)naf = NelsonAalenFitter()for payment_method in df['PaymentMethod'].unique():flag = df['PaymentMethod'] == payment_methodnaf.fit(T[flag], event_observed=E[flag], label=payment_method)naf.plot(ax=ax)plt.title('Cumulative hazard functions by payment method');
The shapes of the cumulative hazard functions are very similar for two automatic payment categories: bank transfer and credit card.
兩種自動付款類別的累積危害函數的形狀非常相似:銀行轉帳和信用卡。

Note: In the theoretical introduction, we mentioned that the reason for working with the cumulative hazard function instead of the hazard function is the higher accuracy of the former. However, lifelines
provides a way of deriving the hazard function from the cumulative one by the means of applying a kernel smoother. So where is the catch? To do so, we need to specify the bandwidth parameter, and the shape of the resulting hazard function is highly dependent on the selected value. I will quote the author’s comment about this approach: “There is no obvious way to choose a bandwidth, and different bandwidths produce different inferences, so it’s best to be very careful here. My advice: stick with the cumulative hazard function.”. In case you are still interested, please check out the documentation.
注意 :在理論介紹中,我們提到使用累積危害函數而不是危害函數的原因是前者的準確性較高。 但是, lifelines
提供了一種通過應用核平滑器來從累積函數推導危害函數的方法。 那么漁獲量在哪里? 為此,我們需要指定帶寬參數,并且所產生的危害函數的形狀高度依賴于所選值。 我將引用作者對此方法的評論:“ 沒有明顯的方法來選擇帶寬,并且不同的帶寬會產生不同的推論,因此最好在這里非常小心。 我的建議:堅持累積危害功能。” 如果您仍然有興趣,請查閱文檔 。
3.結論 (3. Conclusions)
In this article, I tried to provide an introduction to estimating the cumulative hazard function and some intuition about the interpretation of the results. While the Nelson-Aalen estimator is far less popular than the Kaplan-Meier survival curves, understanding it might be very helpful while working with more advanced methods of survival analysis, such as the Cox regression.
在本文中,我試圖提供一個估計累積危害函數的介紹,以及對結果解釋的一些直覺。 盡管Nelson-Aalen估算器不如Kaplan-Meier生存曲線流行,但了解它在使用更高級的生存分析方法(例如Cox回歸)時可能會很有幫助。
You can find the code used for this article on my GitHub. As always, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.
您可以在我的GitHub上找到用于本文的代碼。 一如既往,歡迎任何建設性的反饋。 您可以在Twitter或評論中與我聯系。
In case you found this article interesting, you might also like the other ones in the series:
如果您發現本文有趣,您可能還會喜歡本系列中的其他文章:
4.參考 (4. References)
[1] https://stats.stackexchange.com/questions/60238/intuition-for-cumulative-hazard-function-survival-analysis
[1] https://stats.stackexchange.com/questions/60238/intuition-for-cumulative-hazard-function-survival-analysis
翻譯自: https://towardsdatascience.com/introduction-to-survival-analysis-the-nelson-aalen-estimator-9780c63d549d
變量的作用域和生存期:
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/391094.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/391094.shtml 英文地址,請注明出處:http://en.pswp.cn/news/391094.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!