arima模型怎么擬合_7個統計測試,用于驗證和幫助擬合ARIMA模型

arima模型怎么擬合

什么是ARIMA? (What is ARIMA?)

ARIMA models are one of the most classic and most widely used statistical forecasting techniques when dealing with univariate time series. It basically uses the lag values and lagged forecast errors to predict the feature values.

ARIMA模型是處理單變量時間序列時最經典,使用最廣泛的統計預測技術之一。 它基本上使用滯后值滯后的預測誤差來預測特征值。

Image for post
Full form of ARIMA (Image created by Pratik Gandhi)
ARIMA的完整形式(Pratik Gandhi創建)
  • AR: using the lags of previous values

    AR:使用先前值的滯后

  • I: non-stationary differencing

    I: 非平穩差分

  • MA: moving average for the error term

    MA: 移動平均線 對于錯誤項

Some of these terms are very commonly used when working with time-series data. ARIMA models can fit accurately if we deeply understand these terms or components of the data. Following are the few of them:

其中一些術語在處理時間序列數據時非常常用。 如果我們深刻理解數據的這些術語或組成部分,則ARIMA模型可以準確擬合。 以下是其中一些:

趨勢: (Trend:)

Data is considered to have a trend when there is an increase or decrease direction in the data. E.g. increase of airline passengers during summer, reduction in a number of customers during weekdays, etc.

當數據中存在增加或減少的方向時,數據被認為具有趨勢 。 例如,夏季航空乘客的增加,工作日乘客數量的減少等。

Image for post
Photo by Chris Liverani on Unsplash
Chris Liverani在Unsplash上拍攝的照片

季節性: (Seasonality:)

Data is considered to have a seasonal pattern if the data is influenced by external factors. For instance, growth and fall of leaves are driven by the weather/season of mother nature.

如果數據受外部因素影響,則認為該數據具有季節性模式 。 例如,樹葉的生長和下降是由自然的天氣/季節驅動的。

Image for post
Photo by Chris Lawton on Unsplash
克里斯·勞頓在《 Unsplash》上的照片

循環性: (Cyclicity:)

Data is considered to have a cyclic component if there are repeated but non-periodic fluctuations. In simple words, if the pattern is caused because of certain circumstances and there is no set amount of time, it can be considered as cyclicity. For instance, the stock market exhibits cyclic behavior with highs and lows due to the occurrence of specific events and the time between such peaks is never precise.

如果出現重復但非周期性的波動,則認為數據具有循環成分 。 簡而言之,如果模式是由于某些情況造成的,并且沒有固定的時間量,則可以將其視為周期性。 例如,由于特定事件的發生,股票市場表現出周期性的高低波動,而這種高峰之間的時間從來都不是精確的。

白噪聲: (White Noise:)

This is the random and irregular component of the time series. In other words, the residuals after extracting trend+seasonality+cyclicity from the signal are mostly considered as white noise. The best example of white noise is when you lost your antenna connection to TV in the 90s (yes I am a 90s kid!).

這是時間序列的隨機和不規則部分。 換句話說,從信號中提取趨勢+季節+周期性后的殘差通常被認為是白噪聲。 白噪聲的最好例子是在90年代您失去與電視的天線連接(是的,我是90年代的孩子!)。

Image for post
Photo by Fran Jacquier on Unsplash
Fran Jacquier在Unsplash上的照片

平穩性: (Stationarity:)

A time series with constant mean and zero variance is considered to be stationary. A well-known image that always strikes my mind when considering stationarity is:

具有恒定均值零方差的時間序列被認為是平穩的 。 考慮平穩性時,我總是想起一個眾所周知的圖像:

Image for post
https://beingdatum.com/time-series-forecasting/https://beingdatum.com/time-series-forecasting/

The packages I have used to explain these tests mainly are:

我用來解釋這些測試的軟件包主要是:

  • statsmodels: https://www.statsmodels.org/stable/index.html

    statsmodels : https : //www.statsmodels.org/stable/index.html

  • pmdarima: http://alkaline-ml.com/pmdarima/index.html

    pmdarima : http : //alkaline-ml.com/pmdarima/index.html

There are a lot of tests but I am going to talk about a few that I have used and helped me in my battle with time series problems:

有很多測試,但是我將討論一些在時間序列問題上使用并幫助我的測試:

1.增強的Dickey-Fuller(ADF)測試: (1. Augmented Dickey-Fuller (ADF) test:)

Time series should be made stationary using transformation techniques (log, moving average, etc.) before applying ARIMA models. ADF test is a great way and one of the most widely used techniques to confirm if the series is stationary or not. The data can be found on Kaggle. Below is the code:

在應用ARIMA模型之前,應使用變換技術(對數,移動平均值等)使時間序列固定。 ADF測試是一種很好的方法,也是確認系列是否固定的最廣泛使用的技術之一。 數據可以在Kaggle上找到。 下面是代碼:

Image for post
Difference between Non-Stationary and Stationary Data with their T-statistic value
非平穩數據與平穩數據之間的差異及其T統計量

To make the data stationary we applied some transformation to the data (shown in code above). On calculating the t-statistic value we see that the value is significant and confirms that the data is stationary now!

為了使數據穩定,我們對數據進行了一些轉換(如上面的代碼所示)。 在計算t統計值時,我們看到該值顯著,并確認數據現在處于靜止狀態!

2. PP測試: (2. PP test:)

PP stands for Phillips-Perron test. In some cases, I in ARIMA which stands for Integral is needed. Differencing of I=1 or 2 mostly does the job. This PP test is a unit root test to confirm that the time series is integrated of order 1. This is also an alternative to the ADF test if want to check stationarity. They have become quite popular in the analysis of financial time series[3]. Below is the code:

PP代表Phillips-Perron測試。 在某些情況下,需要ARIMA中代表Integral的I。 I = 1或2的差異大部分可以完成工作。 此PP測試是單位根測試,用于確認時間序列是否已集成1級。如果要檢查平穩性,這也是ADF測試的替代方法。 在金融時間序列分析中,它們已經變得非常流行[3]。 下面是代碼:

This will return a boolean value(1 or 0), indicating whether the series is stationary or not.

這將返回一個布爾值(1或0),指示該序列是否平穩。

3. KPSS測試: (3. KPSS Test:)

A widely used test in econometrics is Kwiatkowski–Phillips–Schmidt–Shint or abbreviated as the KPSS test. This test is pretty similar to ADF too and can help to validate the null hypothesis that an observable time series is stationary around a deterministic trend. There is a major disadvantage though that it has a high rate of type-I errors. In such cases, it is often recommended to combine it with the ADF test and check if both of them return the same results[4]. The code is similar to the ADF test as shown below:

計量經濟學中廣泛使用的測試是Kwiatkowski–Phillips–Schmidt–Shint或簡稱為KPSS測試。 該測試也與ADF非常相似,并且可以幫助驗證可觀察的時間序列在確定性趨勢附近平穩的零假設。 盡管存在很大的I型錯誤率,但它有一個主要缺點 。 在這種情況下,通常建議將其與ADF測試結合使用,并檢查兩者是否返回相同的結果[4]。 該代碼類似于ADF測試,如下所示:

Image for post
Difference between Non-Stationary and Stationary Data
非固定數據與固定數據之間的區別

We can see from the image above that before applying the transformation(figure A) the p-value of data is <0.05 and thus it is not stationary. Post transformation(figure B) the p-value becomes 0.1 to. confirm the stationarity of the data.

從上圖可以看出,在應用變換之前(圖A),數據的p值 <0.05 ,因此它不是平穩的。 轉換后(圖B), p值變為0.1至。 確認數據的平穩性。

Before we dive into the next tests, it is important to know that ARIMA models may contain seasonal component that can be handled by adding a few more parameters(P, D, Q, m) to our ARIMA equation. We can broadly divide ARIMA type of models into two types:

在我們進行下一個測試之前,重要的是要知道ARIMA模型可能包含季節性分量 ,可以通過在ARIMA方程中添加更多參數(P,D,Q,m)來處理這些分量 。 我們可以將ARIMA類型的模型大致分為兩種類型:

  1. ARIMA: Handling Non-seasonal components as explained in the beginning

    ARIMA :如開頭所述處理非季節性組件

  2. SARIMA: Seasonal Component + ARIMA

    SARIMA:S easonal組件 + ARIMA

4. CH測試: (4. CH Test:)

The Canova Hansen(CH) test is mainly used to test for seasonal differences and to validate that the null hypothesis that the seasonal pattern is stable over a sample period or it is changing across time. This is mostly helpful in economic or meteorological data[5]. This is already implemented in Python within pmdarima library.

Canova Hansen(CH)檢驗主要用于檢驗季節差異并驗證零假設,即季節性模式在采樣期內是穩定的或隨時間而變化。 這對經濟或氣象數據最有幫助[5]。 這已經在pmdarima庫中的Python中實現。

5. OCSB測試: (5. OCSB Test:)

Osborn, Chui, Smith, and Birchenhall (OCSB) test is used to determine if the data needs seasonal differencing (D component of P,D,Q,m). pmdarima package has a predefined function that one can leverage as follows:

Osborn,Chui,Smith和Birchenhall(OCSB)檢驗用于確定數據是否需要季節性差異(P,D,Q,m的D分量 )。 pmdarima軟件包具有一項預定義的功能,可以按以下方式使用:

Here, we have defined m = 12 as it is monthly data. ‘aic’ is default lag_method for assessing performance(lower is better). Refer here for other accepted metrics. The output for this data is 1 as we already know that there is definitely visibility of the seasonal component.

在這里,我們將m = 12定義為月度數據。 “ aic”是用于評估效果的默認lag_method (越低越好)。 有關其他可接受的指標,請參考此處 。 該數據的輸出為1,因為我們已經知道季節分量絕對可見。

6.分解圖: (6. Decompose Plot:)

This is one of the tools that can really help when you encounter a time series problem. I think of this function is similar to the doctor taking vitals when you first go for a visit. As the vitals might indicate some obvious things in a patient, the decompose plot gives a breakdown of the data and shows if there are any clear trend, seasonality, and the pattern of residuals. Below is the snippet of the code and the output result:

這是遇到時間序列問題時真正有用的工具之一。 我認為此功能類似于您初次去看醫生時要注意的重要事項。 由于生命體征可能指示患者中有一些明顯的現象,因此分解圖會分解數據并顯示是否存在任何明確的趨勢,季節性和殘差模式。 下面是代碼段和輸出結果:

Image for post
Decomposition Plot: Subplots showing the original data(top), trend, seasonal and residuals(bottom)
分解圖:顯示原始數據(頂部),趨勢,季節性和殘差(底部)的子圖

7. ACF和PACF圖: (7. ACF and PACF Plot:)

ACF and PACF plot stand for Autocorrelation Plot and Partial Autocorrelation Plot respectively. ACF and PACF plot help to determine AR and MA terms needed in a systematic way after the time series has been stationarized. Below are the code for ACF and PACF plots:

ACF和PACF圖分別代表自相關圖和部分自相關圖。 在時間序列平穩后,ACF和PACF圖有助于系統地確定所需的AR和MA項。 以下是ACF和PACF圖的代碼:

Image for post
Autocorrelation Plot for Airline Passengers data
航空公司乘客數據的自相關圖
Image for post
Partial Autocorrelation Plot for Airline Passengers data
航空公司乘客數據的部分自相關圖

The lags which fall inside the blue shaded region are not considered to be significant. Based on the ACF plot we can say that it is AR13 model meaning AutoRegression with 13 lags would help. Based on the PACF plot we can say that it is MA2 model: Moving Average with 2 lags. There are methods to read these plots and have a good estimate of the order of the ARIMA model.

落在藍色陰影區域內的滯后被認為不重要。 基于ACF圖,我們可以說它是AR13模型,意味著具有13個滯后的自回歸將有所幫助。 基于PACF圖,我們可以說它是MA2模型: 2個滯后的移動平均線 。 有一些方法可以讀取這些圖并很好地估計ARIMA模型的階數。

結論: (Conclusion:)

There are many other statistical tests that can be used other than listed above. However, the tests/tools I mentioned here can be really powerful to understand the data and fit accurate ARIMA models.

除上面列出的以外,還有許多其他統計測試可以使用。 但是,我在這里提到的測試/工具對于理解數據和擬合準確的ARIMA模型確實非常強大。

This is my first attempt to write an article on medium. I have learned a lot from my fellow writers and community and this is the best way I think to share or return some of my experiences back to them.

這是我在媒體上寫文章的第一次嘗試。 我從其他作家和社區中學到了很多東西,這是我認為與他人分享或回饋自己經驗的最好方式。

翻譯自: https://medium.com/@pratikkgandhi/7-statistical-tests-to-validate-and-help-to-fit-arima-model-33c5853e2e93

arima模型怎么擬合

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388678.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388678.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388678.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

jQuery禁止Ajax請求緩存

一 現象 get請求在有些瀏覽器中會緩存。瀏覽器不會發送請求&#xff0c;而是使用上次請求獲取到的結果。 post請求不會緩存。每次都會發送請求。 二 解決 jQuery提供了禁止Ajax請求緩存的方法&#xff1a; $.ajax({type: "get",url: "http://www.baidu.com?_&…

python 實例

參考 http://developer.51cto.com/art/201804/570408.htm 轉載于:https://www.cnblogs.com/artesian0526/p/9552510.html

[WPF]ListView點擊列頭排序功能實現

[WPF]ListView點擊列頭排序功能實現 這是一個非常常見的功能&#xff0c;要求也很簡單&#xff0c;在Column Header上顯示一個小三角表示表示現在是在哪個Header上的正序還是倒序就可以了。微軟的MSDN也已經提供了實現方式。微軟的方法中&#xff0c;是通過ColumnHeader Templ…

天池幸福感的數據處理_了解幸福感與數據(第1部分)

天池幸福感的數據處理In these exceptional times, the lockdown left many of us with a lot of time to think. Think about the past and the future. Think about our way of life and our achievements. But most importantly, think about what has been and would be ou…

標線markLine的用法

series: [{markLine: {itemStyle: {normal: { lineStyle: { type: solid, color:#000 },label: { show: true, position:left } }},data: [{name: 平均線,// 支持 average, min, maxtype: average},{name: Y 軸值為 100 的水平線,yAxis: 100},[{// 起點和終點的項會共用一個 na…

php pfm 改端口,羅馬2ESF和PFM 修改建筑 軍團 派系 兵種等等等很多東西的教程

本帖最后由 clueber 于 2013-10-5 12:30 編輯本人是個羅馬死忠加修改黨&#xff0c;恩&#xff0c;所以分享一下自己的修改心得修改工具為ESF1.0.7和PFM3.0.3首先是ESF修改。ESF可以用來改開局設定和存檔&#xff0c;修改開局設定是startpos.esf文件&#xff0c;在存檔在我這里…

紅草綠葉

從小到大喜歡陰天&#xff0c;喜歡下雨&#xff0c;喜歡那種潮濕的感覺。卻又絲毫容不得腳上有一絲的水汽&#xff0c;也極其討厭穿涼鞋。小時候特別喜歡去山上玩&#xff0c;偷桃子柿子&#xff0c;一切一切都成了美好的回憶&#xff0c;長大了&#xff0c;那些事情就都不復存…

wpf listview 使用

單列&#xff1a; <ListView Grid.Column"1" Height"284" HorizontalAlignment"Left" Margin"64,73,0,0" Name"listView1" VerticalAlignment"Top" Width"310" > <ListView.Items…

php 獲取當天到23 59,js 獲取當天23點59分59秒 時間戳 (最簡單的方法)

原生Ajax 和Jq Ajax前言:這次介紹的是利用ajax與后臺進行數據交換的小例子,所以demo必須通過服務器來打開.服務器環境非常好搭建,從網上下載wamp或xampp,一步步安裝就ok,然后再把寫好的頁面放在服務器中指定的 ...『TCP&sol;IP詳解——卷一&#xff1a;協議』讀書筆記——1…

詹森不等式_注意詹森差距

詹森不等式背景 (Background) In Kaggle’s M5 Forecasting — Accuracy competition, the square root transformation ruined many of my team’s forecasts and led to a selective patching effort in the eleventh hour. Although it turned out well, we were reminded t…

【轉載】儒林外史人物——荀玫

寫在前面&#xff1a;本博客內容為轉載&#xff0c;原文URL&#xff1a;http://blog.sina.com.cn/s/blog_9132ac5b0101iukw.html 說完周進&#xff0c;本應順著說范進&#xff0c;但我覺得荀玫他們村的事情過于喜感&#xff0c;想先說荀玫。 荀玫簡直是儒林中的某類標桿人物&am…

WebM VP8 SDK Usage/關于WebM VP8 SDK的用法

WebM是Google提出的新的網絡視頻格式&#xff0c;本質上是個MKV的殼&#xff0c;封裝VPX中的VP8視頻流與Vorbis OGG音頻流。目前Firefox、Opera、Chrome都能直接打開WebM視頻文件而無需其他任何亂七八糟的插件。我個人倒是很喜歡WebM的OGG音頻&#xff0c;雖然在低比特率下不如…

數據分析師 需求分析師_是什么讓分析師出色?

數據分析師 需求分析師重點 (Top highlight)Before we dissect the nature of analytical excellence, let’s start with a quick summary of three common misconceptions about analytics from Part 1:在剖析卓越分析的本質之前&#xff0c;讓我們從第1部分中對分析的三種常…

JQuery發起ajax請求,并在頁面動態的添加元素

頁面html代碼&#xff1a; <li><div class"coll-tit"><span class"coll-icon"><iclass"sysfont coll-default"></i>全域旅游目的地</span></div><div class"coll-panel"><div c…

arcgis鏡像圖形工具,ArcGis圖形編輯

一、編輯工具條介紹二、草圖工具介紹Sketch Tool&#xff1a;使用草圖工具來創建點要素或是線或面要素的節點。雙擊或是F2鍵結束草圖狀態&#xff0c;轉化為要素。Intersection Tool&#xff1a;使用相交工具在兩個線要素相交(或延長相交)的地方創建一個節點。如圖&#xff1a;…

MAYA插件入門

我們知道&#xff0c; MAYA 是一個基于結點的插件式軟件架構&#xff0c;這種開放式的軟件架構是非常優秀的&#xff0c;它可以讓用戶非常方便地在其基礎上開發一些自已想要的插件&#xff0c;從而實現一些特殊的功能或效果。 在MAYA上開發自已的插件&#xff0c;你有3種選擇&a…

(原創) 如何使用C++/CLI讀/寫jpg檔? (.NET) (C++/CLI) (GDI+) (C/C++) (Image Processing)

Abstract因為Computer Vision的作業&#xff0c;之前都是用C# GDI寫&#xff0c;但這次的作業要做Grayscale Dilation&#xff0c;想用STL的Generic Algorithm寫&#xff0c;但C Standard Library并無法讀取jpg檔&#xff0c;用其它Library又比較麻煩&#xff0c;所以又回頭想…

貓眼電影評論_電影的人群意見和評論家的意見一樣好嗎?

貓眼電影評論Ryan Bellgardt’s 2018 movie, The Jurassic Games, tells the story of ten death row inmates who must compete for survival in a virtual reality game where they not only fight each other but must also fight dinosaurs which can kill them both in th…

128.Two Sum

題目&#xff1a; Given an array of integers, return indices of the two numbers such that they add up to a specific target. 給定一個整數數組&#xff0c;返回兩個數字的索引&#xff0c;使它們相加到特定目標。 You may assume that each input would have exactly on…