arima模型怎么擬合

什么是ARIMA？ (What is ARIMA?)

ARIMA models are one of the most classic and most widely used statistical forecasting techniques when dealing with univariate time series. It basically uses the lag values and lagged forecast errors to predict the feature values.

ARIMA模型是處理單變量時間序列時最經典，使用最廣泛的統計預測技術之一。它基本上使用滯后值和滯后的預測誤差來預測特征值。

Image for post — Full form of ARIMA (Image created by Pratik Gandhi)

AR: using the lags of previous values
AR：使用先前值的滯后
I: non-stationary differencing
I： 非平穩差分
MA: moving average for the error term
MA： 移動平均線 對于錯誤項

Some of these terms are very commonly used when working with time-series data. ARIMA models can fit accurately if we deeply understand these terms or components of the data. Following are the few of them:

其中一些術語在處理時間序列數據時非常常用。如果我們深刻理解數據的這些術語或組成部分，則ARIMA模型可以準確擬合。以下是其中一些：

趨勢： (Trend:)

Data is considered to have a trend when there is an increase or decrease direction in the data. E.g. increase of airline passengers during summer, reduction in a number of customers during weekdays, etc.

當數據中存在增加或減少的方向時，數據被認為具有趨勢。例如，夏季航空乘客的增加，工作日乘客數量的減少等。

季節性： (Seasonality:)

Data is considered to have a seasonal pattern if the data is influenced by external factors. For instance, growth and fall of leaves are driven by the weather/season of mother nature.

如果數據受外部因素影響，則認為該數據具有季節性模式 。例如，樹葉的生長和下降是由自然的天氣/季節驅動的。

循環性： (Cyclicity:)

Data is considered to have a cyclic component if there are repeated but non-periodic fluctuations. In simple words, if the pattern is caused because of certain circumstances and there is no set amount of time, it can be considered as cyclicity. For instance, the stock market exhibits cyclic behavior with highs and lows due to the occurrence of specific events and the time between such peaks is never precise.

如果出現重復但非周期性的波動，則認為數據具有循環成分 。簡而言之，如果模式是由于某些情況造成的，并且沒有固定的時間量，則可以將其視為周期性。例如，由于特定事件的發生，股票市場表現出周期性的高低波動，而這種高峰之間的時間從來都不是精確的。

白噪聲： (White Noise:)

This is the random and irregular component of the time series. In other words, the residuals after extracting trend+seasonality+cyclicity from the signal are mostly considered as white noise. The best example of white noise is when you lost your antenna connection to TV in the 90s (yes I am a 90s kid!).

這是時間序列的隨機和不規則部分。換句話說，從信號中提取趨勢+季節+周期性后的殘差通常被認為是白噪聲。白噪聲的最好例子是在90年代您失去與電視的天線連接(是的，我是90年代的孩子！)。

平穩性： (Stationarity:)

A time series with constant mean and zero variance is considered to be stationary. A well-known image that always strikes my mind when considering stationarity is:

具有恒定均值和零方差的時間序列被認為是平穩的 。考慮平穩性時，我總是想起一個眾所周知的圖像：

The packages I have used to explain these tests mainly are:

我用來解釋這些測試的軟件包主要是：

statsmodels: https://www.statsmodels.org/stable/index.html
statsmodels ： https ： //www.statsmodels.org/stable/index.html
pmdarima: http://alkaline-ml.com/pmdarima/index.html
pmdarima ： http : //alkaline-ml.com/pmdarima/index.html

There are a lot of tests but I am going to talk about a few that I have used and helped me in my battle with time series problems:

有很多測試，但是我將討論一些在時間序列問題上使用并幫助我的測試：

1.增強的Dickey-Fuller(ADF)測試： (1. Augmented Dickey-Fuller (ADF) test:)

Time series should be made stationary using transformation techniques (log, moving average, etc.) before applying ARIMA models. ADF test is a great way and one of the most widely used techniques to confirm if the series is stationary or not. The data can be found on Kaggle. Below is the code:

在應用ARIMA模型之前，應使用變換技術(對數，移動平均值等)使時間序列固定。 ADF測試是一種很好的方法，也是確認系列是否固定的最廣泛使用的技術之一。數據可以在Kaggle上找到。下面是代碼：

To make the data stationary we applied some transformation to the data (shown in code above). On calculating the t-statistic value we see that the value is significant and confirms that the data is stationary now!

為了使數據穩定，我們對數據進行了一些轉換(如上面的代碼所示)。在計算t統計值時，我們看到該值顯著，并確認數據現在處于靜止狀態！

2. PP測試： (2. PP test:)

PP stands for Phillips-Perron test. In some cases, I in ARIMA which stands for Integral is needed. Differencing of I=1 or 2 mostly does the job. This PP test is a unit root test to confirm that the time series is integrated of order 1. This is also an alternative to the ADF test if want to check stationarity. They have become quite popular in the analysis of financial time series[3]. Below is the code:

PP代表Phillips-Perron測試。在某些情況下，需要ARIMA中代表Integral的I。 I = 1或2的差異大部分可以完成工作。此PP測試是單位根測試，用于確認時間序列是否已集成1級。如果要檢查平穩性，這也是ADF測試的替代方法。在金融時間序列分析中，它們已經變得非常流行[3]。下面是代碼：

This will return a boolean value(1 or 0), indicating whether the series is stationary or not.

這將返回一個布爾值(1或0)，指示該序列是否平穩。

3. KPSS測試： (3. KPSS Test:)

A widely used test in econometrics is Kwiatkowski–Phillips–Schmidt–Shint or abbreviated as the KPSS test. This test is pretty similar to ADF too and can help to validate the null hypothesis that an observable time series is stationary around a deterministic trend. There is a major disadvantage though that it has a high rate of type-I errors. In such cases, it is often recommended to combine it with the ADF test and check if both of them return the same results[4]. The code is similar to the ADF test as shown below:

計量經濟學中廣泛使用的測試是Kwiatkowski–Phillips–Schmidt–Shint或簡稱為KPSS測試。該測試也與ADF非常相似，并且可以幫助驗證可觀察的時間序列在確定性趨勢附近平穩的零假設。盡管存在很大的I型錯誤率，但它有一個主要缺點 。在這種情況下，通常建議將其與ADF測試結合使用，并檢查兩者是否返回相同的結果[4]。該代碼類似于ADF測試，如下所示：

We can see from the image above that before applying the transformation(figure A) the p-value of data is <0.05 and thus it is not stationary. Post transformation(figure B) the p-value becomes 0.1 to. confirm the stationarity of the data.

從上圖可以看出，在應用變換之前(圖A)，數據的p值 <0.05 ，因此它不是平穩的。轉換后(圖B)， p值變為0.1至。確認數據的平穩性。

Before we dive into the next tests, it is important to know that ARIMA models may contain seasonal component that can be handled by adding a few more parameters(P, D, Q, m) to our ARIMA equation. We can broadly divide ARIMA type of models into two types:

在我們進行下一個測試之前，重要的是要知道ARIMA模型可能包含季節性分量，可以通過在ARIMA方程中添加更多參數(P，D，Q，m)來處理這些分量。我們可以將ARIMA類型的模型大致分為兩種類型：

ARIMA: Handling Non-seasonal components as explained in the beginning
ARIMA ：如開頭所述處理非季節性組件
SARIMA: Seasonal Component + ARIMA
SARIMA：S easonal組件 + ARIMA

4. CH測試： (4. CH Test:)

The Canova Hansen(CH) test is mainly used to test for seasonal differences and to validate that the null hypothesis that the seasonal pattern is stable over a sample period or it is changing across time. This is mostly helpful in economic or meteorological data[5]. This is already implemented in Python within pmdarima library.

Canova Hansen(CH)檢驗主要用于檢驗季節差異并驗證零假設，即季節性模式在采樣期內是穩定的或隨時間而變化。這對經濟或氣象數據最有幫助[5]。這已經在pmdarima庫中的Python中實現。

5. OCSB測試： (5. OCSB Test:)

Osborn, Chui, Smith, and Birchenhall (OCSB) test is used to determine if the data needs seasonal differencing (D component of P,D,Q,m). pmdarima package has a predefined function that one can leverage as follows:

Osborn，Chui，Smith和Birchenhall(OCSB)檢驗用于確定數據是否需要季節性差異(P，D，Q，m的D分量 )。 pmdarima軟件包具有一項預定義的功能，可以按以下方式使用：

Here, we have defined m = 12 as it is monthly data. ‘aic’ is default lag_method for assessing performance(lower is better). Refer here for other accepted metrics. The output for this data is 1 as we already know that there is definitely visibility of the seasonal component.

在這里，我們將m = 12定義為月度數據。 “ aic”是用于評估效果的默認lag_method (越低越好)。有關其他可接受的指標，請參考此處。該數據的輸出為1，因為我們已經知道季節分量絕對可見。

6.分解圖： (6. Decompose Plot:)

This is one of the tools that can really help when you encounter a time series problem. I think of this function is similar to the doctor taking vitals when you first go for a visit. As the vitals might indicate some obvious things in a patient, the decompose plot gives a breakdown of the data and shows if there are any clear trend, seasonality, and the pattern of residuals. Below is the snippet of the code and the output result:

這是遇到時間序列問題時真正有用的工具之一。我認為此功能類似于您初次去看醫生時要注意的重要事項。由于生命體征可能指示患者中有一些明顯的現象，因此分解圖會分解數據并顯示是否存在任何明確的趨勢，季節性和殘差模式。下面是代碼段和輸出結果：

7. ACF和PACF圖： (7. ACF and PACF Plot:)

ACF and PACF plot stand for Autocorrelation Plot and Partial Autocorrelation Plot respectively. ACF and PACF plot help to determine AR and MA terms needed in a systematic way after the time series has been stationarized. Below are the code for ACF and PACF plots:

ACF和PACF圖分別代表自相關圖和部分自相關圖。在時間序列平穩后，ACF和PACF圖有助于系統地確定所需的AR和MA項。以下是ACF和PACF圖的代碼：

The lags which fall inside the blue shaded region are not considered to be significant. Based on the ACF plot we can say that it is AR13 model meaning AutoRegression with 13 lags would help. Based on the PACF plot we can say that it is MA2 model: Moving Average with 2 lags. There are methods to read these plots and have a good estimate of the order of the ARIMA model.

落在藍色陰影區域內的滯后被認為不重要。基于ACF圖，我們可以說它是AR13模型，意味著具有13個滯后的自回歸將有所幫助。基于PACF圖，我們可以說它是MA2模型： 2個滯后的移動平均線 。有一些方法可以讀取這些圖并很好地估計ARIMA模型的階數。

結論： (Conclusion:)

There are many other statistical tests that can be used other than listed above. However, the tests/tools I mentioned here can be really powerful to understand the data and fit accurate ARIMA models.

除上面列出的以外，還有許多其他統計測試可以使用。但是，我在這里提到的測試/工具對于理解數據和擬合準確的ARIMA模型確實非常強大。

This is my first attempt to write an article on medium. I have learned a lot from my fellow writers and community and this is the best way I think to share or return some of my experiences back to them.

這是我在媒體上寫文章的第一次嘗試。我從其他作家和社區中學到了很多東西，這是我認為與他人分享或回饋自己經驗的最好方式。

翻譯自: https://medium.com/@pratikkgandhi/7-statistical-tests-to-validate-and-help-to-fit-arima-model-33c5853e2e93

arima模型怎么擬合

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/388678.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/388678.shtml
英文地址，請注明出處：http://en.pswp.cn/news/388678.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！