離群值如何處理_有理處理離群值的局限性

離群值如何處理

ARIMA models can be quite adept when it comes to modelling the overall trend of a series along with seasonal patterns.

ARIMA模型可以很好地建模一系列總體趨勢以及季節性模式。

In a previous article titled SARIMA: Forecasting Seasonal Data with Python and R, the use of an ARIMA model for forecasting maximum air temperature values for Dublin, Ireland was used.

在上一篇名為SARIMA:使用Python和R預測季節性數據的文章中,使用了ARIMA模型來預測愛爾蘭都柏林的最高氣溫。

The results showed significant accuracy, with 70% of the predictions ranging within 10% of the actual temperature values.

結果顯示出顯著的準確性,其中70%的預測值在實際溫度值的10%范圍內。

預測更多極端天氣情況 (Forecasting More Extreme Weather Conditions)

That said, the data that was being used for the previous example took temperature values that did not particularly show extreme values. For instance, the minimum temperature value was 4.8°C while the maximum temperature value was 28.7°C. Neither of these values lie outside the norm for typical yearly Irish weather.

就是說,先前示例中使用的數據采用的溫度值并未特別顯示極端值。 例如,最小溫度值為4.8°C,而最大溫度值為28.7°C。 這些值都不超出典型的愛爾蘭年度天氣的標準。

However, let’s consider a more extreme example.

但是,讓我們考慮一個更極端的例子。

Braemar is a village located in the Scottish highlands in Aberdeenshire, and is known as one of the coldest places in the United Kingdom in winter. In January 1982, a low of -27.2°C was recorded at this location according to the UK Met Office — which deviates strongly from the average minimum temperature of -1.5°C that was recorded between 1981–2010.

Braemar是位于阿伯丁郡蘇格蘭高地的一個村莊,被譽為冬季英國最冷的地方之一。 根據英國氣象局的數據 ,1982年1月,該地點的最低溫度為-27.2°C,這與1981-2010年間記錄的平均最低溫度 -1.5°C明顯不同。

How would an ARIMA model perform when forecasting an abnormally cold winter for Braemar?

預測Braemar異常寒冷的冬天時,ARIMA模型將如何執行?

An ARIMA model is built using monthly Met Office data from January 1959 — July 2020 (contains public sector information licensed under the Open Government Licence v1.0).

ARIMA模型是使用1959年1月至2020年7月的大都會辦公室每月數據構建的(包含根據開放政府許可證v1.0 許可的公共部門信息)。

The time series is defined:

時間序列定義為:

weatherarima <- ts(mydata$tmin[1:591], start = c(1959,1), frequency = 12)
plot(weatherarima,type="l",ylab="Temperature")
title("Minimum Recorded Monthly Temperature: Braemar, Scotland")

Here is a plot of the monthly data:

以下是每月數據的圖表:

Image for post
Source: UK Met Office Weather Data
資料來源:英國氣象局氣象數據

Here is an overview of the individual time series components:

以下是各個時間序列組成部分的概述:

Image for post
Source: RStudio
資料來源:RStudio

ARIMA模型配置 (ARIMA Model Configuration)

80% of the dataset (the first 591 months of data) are used to build the ARIMA model. The latter 20% of time series data is then used as validation data to compare the accuracy of the predictions to the actual values.

數據集的80%(最初的591個月的數據)用于構建ARIMA模型。 然后將時間序列數據的后20%用作驗證數據,以將預測的準確性與實際值進行比較。

Using auto.arima, the p, d, and q coordinates of best fit are selected:

使用auto.arima,選擇最合適的pdq坐標:

# ARIMA
fitweatherarima<-auto.arima(weatherarima, trace=TRUE, test="kpss", ic="bic")
fitweatherarima
confint(fitweatherarima)
plot(weatherarima,type='l')
title('Minimum Recorded Monthly Temperature: Braemar, Scotland')

The best configuration is selected as follows:

最佳配置選擇如下:

> # ARIMA
> fitweatherarima<-auto.arima(weatherarima, trace=TRUE, test="kpss", ic="bic")Fitting models using approximations to speed things up...ARIMA(2,0,2)(1,1,1)[12] with drift : 2257.369
ARIMA(0,0,0)(0,1,0)[12] with drift : 2565.334
ARIMA(1,0,0)(1,1,0)[12] with drift : 2425.901
ARIMA(0,0,1)(0,1,1)[12] with drift : 2246.551
ARIMA(0,0,0)(0,1,0)[12] : 2558.978
ARIMA(0,0,1)(0,1,0)[12] with drift : 2558.621
ARIMA(0,0,1)(1,1,1)[12] with drift : 2242.724
ARIMA(0,0,1)(1,1,0)[12] with drift : 2427.871
ARIMA(0,0,1)(2,1,1)[12] with drift : 2259.357
ARIMA(0,0,1)(1,1,2)[12] with drift : Inf
ARIMA(0,0,1)(0,1,2)[12] with drift : 2252.908
ARIMA(0,0,1)(2,1,0)[12] with drift : 2341.9
ARIMA(0,0,1)(2,1,2)[12] with drift : 2249.612
ARIMA(0,0,0)(1,1,1)[12] with drift : 2264.59
ARIMA(1,0,1)(1,1,1)[12] with drift : 2248.085
ARIMA(0,0,2)(1,1,1)[12] with drift : 2246.688
ARIMA(1,0,0)(1,1,1)[12] with drift : 2241.727
ARIMA(1,0,0)(0,1,1)[12] with drift : Inf
ARIMA(1,0,0)(2,1,1)[12] with drift : 2261.885
ARIMA(1,0,0)(1,1,2)[12] with drift : Inf
ARIMA(1,0,0)(0,1,0)[12] with drift : 2556.722
ARIMA(1,0,0)(0,1,2)[12] with drift : Inf
ARIMA(1,0,0)(2,1,0)[12] with drift : 2338.482
ARIMA(1,0,0)(2,1,2)[12] with drift : 2248.515
ARIMA(2,0,0)(1,1,1)[12] with drift : 2250.884
ARIMA(2,0,1)(1,1,1)[12] with drift : 2254.411
ARIMA(1,0,0)(1,1,1)[12] : 2237.953
ARIMA(1,0,0)(0,1,1)[12] : Inf
ARIMA(1,0,0)(1,1,0)[12] : 2419.587
ARIMA(1,0,0)(2,1,1)[12] : 2256.396
ARIMA(1,0,0)(1,1,2)[12] : Inf
ARIMA(1,0,0)(0,1,0)[12] : 2550.361
ARIMA(1,0,0)(0,1,2)[12] : Inf
ARIMA(1,0,0)(2,1,0)[12] : 2332.136
ARIMA(1,0,0)(2,1,2)[12] : 2243.701
ARIMA(0,0,0)(1,1,1)[12] : 2262.382
ARIMA(2,0,0)(1,1,1)[12] : 2245.429
ARIMA(1,0,1)(1,1,1)[12] : 2244.31
ARIMA(0,0,1)(1,1,1)[12] : 2239.268
ARIMA(2,0,1)(1,1,1)[12] : 2249.168Now re-fitting the best model(s) without approximations...ARIMA(1,0,0)(1,1,1)[12] : Inf
ARIMA(0,0,1)(1,1,1)[12] : Inf
ARIMA(1,0,0)(1,1,1)[12] with drift : Inf
ARIMA(0,0,1)(1,1,1)[12] with drift : Inf
ARIMA(1,0,0)(2,1,2)[12] : Inf
ARIMA(1,0,1)(1,1,1)[12] : Inf
ARIMA(2,0,0)(1,1,1)[12] : Inf
ARIMA(0,0,1)(0,1,1)[12] with drift : Inf
ARIMA(0,0,2)(1,1,1)[12] with drift : Inf
ARIMA(1,0,1)(1,1,1)[12] with drift : Inf
ARIMA(1,0,0)(2,1,2)[12] with drift : Inf
ARIMA(2,0,1)(1,1,1)[12] : Inf
ARIMA(0,0,1)(2,1,2)[12] with drift : Inf
ARIMA(2,0,0)(1,1,1)[12] with drift : Inf
ARIMA(0,0,1)(0,1,2)[12] with drift : Inf
ARIMA(2,0,1)(1,1,1)[12] with drift : Inf
ARIMA(1,0,0)(2,1,1)[12] : Inf
ARIMA(2,0,2)(1,1,1)[12] with drift : Inf
ARIMA(0,0,1)(2,1,1)[12] with drift : Inf
ARIMA(1,0,0)(2,1,1)[12] with drift : Inf
ARIMA(0,0,0)(1,1,1)[12] : Inf
ARIMA(0,0,0)(1,1,1)[12] with drift : Inf
ARIMA(1,0,0)(2,1,0)[12] : 2355.279Best model: ARIMA(1,0,0)(2,1,0)[12]

The parameters of the model are as follows:

該模型的參數如下:

> fitweatherarima
Series: weatherarima
ARIMA(1,0,0)(2,1,0)[12]Coefficients:
ar1 sar1 sar2
0.2372 -0.6523 -0.3915
s.e. 0.0411 0.0392 0.0393

Using the configured model ARIMA(1,0,0)(2,1,0)[12], the forecasted values are generated:

使用配置的模型ARIMA(1,0,0)(2,1,0)[12] ,將生成預測值:

forecastedvalues=forecast(fitweatherarima,h=148)
forecastedvalues
plot(forecastedvalues)

Here is a plot of the forecasts:

這是預測的圖:

Image for post
Source: RStudio
資料來源:RStudio

Now, a data frame can be generated to compare the forecasted with actual values:

現在,可以生成一個數據框以將預測值與實際值進行比較:

df<-data.frame(mydata$tmin[592:739],forecastedvalues$mean)
col_headings<-c("Actual Weather","Forecasted Weather")
names(df)<-col_headings
attach(df)
Image for post
Source: RStudio
資料來源:RStudio

Additionally, using the Metrics library in R, the RMSE (root mean squared error) value can be calculated.

此外,使用R中的Metrics庫,可以計算RMSE(均方根誤差)值。

> library(Metrics)
> rmse(df$`Actual Weather`,df$`Forecasted Weather`)
[1] 1.780472
> mean(df$`Actual Weather`)
[1] 2.876351
> var(df$`Actual Weather`)
[1] 17.15774

It is observed that with a mean temperature of 2.87°C, the recorded RMSE of 1.78 is significantly large when compared to the mean.

可以看出,平均溫度為2.87°C,與平均溫度相比,記錄的RMSE為1.78很大。

Let’s investigate the more extreme values in the data further.

讓我們進一步研究數據中更極端的值。

Image for post
Source: RStudio
資料來源:RStudio

We can see that when it comes to forecasting particularly extreme minimum temperatures (below -4°C for the sake of argument), we see that the ARIMA model significantly overestimates the value of the minimum temperature.

我們可以看到,在預測特別極端的最低溫度(出于爭論的目的,低于-4°C)時,我們可以看到ARIMA模型大大高估了最低溫度的值。

In this regard, the size of the RMSE is just over 60% relative to the mean temperature of 2.87°C in the test set — for the reason that RMSE penalises larger errors more heavily.

在這方面,RMSE的大小相對于測試集中的平均溫度2.87°C剛好超過60%,這是因為RMSE會更嚴厲地懲罰較大的誤差。

In this regard, it would seem that the ARIMA model is effective at capturing temperatures that are more in the normal range of values.

在這方面,ARIMA模型似乎可以有效地捕獲更多處于正常值范圍內的溫度。

Image for post
Source: RStudio
資料來源:RStudio

However, the model falls short in predicting values at the more extreme ends of the scales — particularly for the winter months.

但是,該模型無法預測更極端的數值,尤其是在冬季。

That said, what if the lower end of the ARIMA forecast was used?

就是說,如果使用ARIMA預測的下限怎么辦?

df<-data.frame(mydata$tmin[592:739],forecastedvalues$lower)
col_headings<-c("Actual Weather","Forecasted Weather")
names(df)<-col_headings
attach(df)
Image for post
Source: RStudio
資料來源:RStudio

We see that while the model is performing better in forecasting the minimum values, the actual minimums still exceed that of the forecast.

我們看到,盡管模型在預測最小值方面表現更好,但實際最小值仍超過了預測值。

Moreover, this does not solve the problem as it means that the model will now significantly underestimate temperature values above the mean.

此外,這不能解決問題,因為這意味著該模型現在將大大低估高于平均值的溫度值。

As a result, the RMSE increases significantly:

結果,RMSE顯著增加:

> library(Metrics)
> rmse(df$`Actual Weather`,df$`Forecasted Weather`)
[1] 3.907014
> mean(df$`Actual Weather`)
[1] 2.876351

In this regard, ARIMA models should be interpreted with caution. While they can be effective in capturing seasonality and the overall trend, they can fall short in forecasting values that fall significantly outside the norm.

在這方面,ARIMA模型應謹慎解釋。 盡管它們可以有效地捕獲季節性和總體趨勢,但在預測值超出正常范圍的情況下可能會不足。

When it comes to forecasting such values, statistical tools such as Monte Carlo simulations can be more effective in modelling a potential range of more extreme values. Here is a follow-up article that discusses how extreme weather events can potentially be modelled using this method.

在預測此類值時,諸如蒙特卡洛模擬之類的統計工具可以更有效地建模更極端值的潛在范圍。 以下是后續文章 ,討論了如何使用這種方法來模擬極端天氣事件。

結論 (Conclusion)

In this example, we have seen that ARIMA can be limited in forecasting extreme values. While the model is adept at modelling seasonality and trends, outliers are difficult to forecast for ARIMA for the very reason that they lie outside of the general trend as captured by the model.

在此示例中,我們已經看到ARIMA在預測極值時可能受到限制。 盡管該模型擅長于對季節和趨勢進行建模,但由于ARIMA超出了模型捕獲的總體趨勢,因此很難預測ARIMA。

Many thanks for reading, and you can find more of my data science content at michael-grogan.com.

非常感謝您的閱讀,您可以在michael-grogan.com上找到更多我的數據科學內容。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way. The findings and interpretations in this article are those of the author and are not endorsed by or affiliated with the UK Met Office in any way.

免責聲明:本文按“原樣”撰寫,不作任何擔保。 它旨在提供數據科學概念的概述,并且不應以任何方式解釋為專業建議。 本文中的發現和解釋僅歸作者所有,并不以任何方式得到英國氣象局的認可或附屬。

翻譯自: https://towardsdatascience.com/limitations-of-arima-dealing-with-outliers-30cc0c6ddf33

離群值如何處理

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389954.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389954.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389954.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

網絡爬蟲基礎練習

0.可以新建一個用于練習的html文件&#xff0c;在瀏覽器中打開。 1.利用requests.get(url)獲取網頁頁面的html文件 import requests newsurlhttp://news.gzcc.cn/html/xiaoyuanxinwen/ res requests.get(newsurl) #返回response對象 res.encodingutf-8 2.利用BeautifulSoup的H…

10生活便捷:購物、美食、看病時這樣搜,至少能省一半心

本次課程介紹實實在在能夠救命、省錢的網站&#xff0c;解決了眼前這些需求后&#xff0c;還有“詩和遠方”——不花錢也能點亮自己的生活&#xff0c;獲得美的享受&#xff01; 1、健康醫療這么搜&#xff0c;安全又便捷 現在的醫療市場確實有些混亂&#xff0c;由于醫療的專業…

ppt圖表圖表類型起始_梅科圖表

ppt圖表圖表類型起始There are different types of variable width bar charts but two are the most popular: 1) Bar Mekko chart; 2) Marimekko chart.可變寬度條形圖有不同類型&#xff0c;但最受歡迎的有兩種&#xff1a;1)Mekko條形圖&#xff1b; 2)Marimekko圖表。 Th…

Tomcat日志亂碼了怎么處理?

【前言】 tomacat日志有三個地方&#xff0c;分別是Output(控制臺)、Tomcat Localhost Log(tomcat本地日志)、Tomcat Catalina Log。 啟動日志和大部分報錯日志、普通日志都在output打印;有些錯誤日志&#xff0c;在Tomcat Localhost Log。 三個日志顯示區&#xff0c;都可能…

python 編碼規范

縮進 用4個空格來縮進代碼 分號 不要在行尾加分號, 也不要用分號將兩條命令放在同一行。 行長度 每行不超過80個字符 以下情況除外&#xff1a; l 長的導入模塊語句 l 注釋里的URL 不要使用反斜杠連接行。 Python會將 圓括號, 中括號和花括號中的行隱式的連接起來 , 你可以利用…

5888. 網絡空閑的時刻

5888. 網絡空閑的時刻 給你一個有 n 個服務器的計算機網絡&#xff0c;服務器編號為 0 到 n - 1 。同時給你一個二維整數數組 edges &#xff0c;其中 edges[i] [ui, vi] 表示服務器 ui 和 vi 之間有一條信息線路&#xff0c;在 一秒 內它們之間可以傳輸 任意 數目的信息。再…

django框架預備知識

內容&#xff1a; 1.web預備知識 2.django介紹 3.web框架的本質及分類 4.django安裝與基本設置 1.web預備知識 HTTP協議&#xff1a;https://www.cnblogs.com/wyb666/p/9383077.html 關于web的本質&#xff1a;http://www.cnblogs.com/wyb666/p/9034042.html 如何自定義web框架…

現實世界 機器學習_公司溝通分析簡介現實世界的機器學習方法

現實世界 機器學習In my previous posts I covered analytical subjects from a scientific point of view, rather than an applied real world problem. For this reason, this article aims at approaching an analytical idea from a managerial point of view, rather tha…

拷貝構造函數和賦值函數

1、拷貝構造函數&#xff1a;用一個已經有的對象構造一個新的對象。 CA&#xff08;const CA & c &#xff09;函數的名稱必須和類名稱相一致&#xff0c;它的唯一的一個參數是本類型的一個引用變量&#xff0c;該參數是const 類型&#xff0c;不可變。 拷貝構造函數什么時…

[bzoj3036]綠豆蛙的歸宿

題目大意&#xff1a;給定 $DAG$ 帶邊權連通圖&#xff0c;保證所有點都能到達終點 $n$&#xff0c;每個點等概率沿邊走&#xff0c;求起點 $1$ 到終點 $n$ 的期望長度。 題解&#xff1a;拓撲&#xff0c;然后倒著$DP$就可以了 卡點&#xff1a;無 C Code&#xff1a; #includ…

5902. 檢查句子中的數字是否遞增

5902. 檢查句子中的數字是否遞增 句子是由若干 token 組成的一個列表&#xff0c;token 間用 單個 空格分隔&#xff0c;句子沒有前導或尾隨空格。每個 token 要么是一個由數字 0-9 組成的不含前導零的 正整數 &#xff0c;要么是一個由小寫英文字母組成的 單詞 。 示例&…

蒜頭君吃桃

蒜頭君買了一堆桃子不知道個數&#xff0c;第一天吃了一半的桃子&#xff0c;還不過癮&#xff0c;有多吃了一個。以后他每天吃剩下的桃子的一半還多一個&#xff0c;到 nn 天只剩下一個桃子了。蒜頭君想知道一開始買了多少桃子。 輸入格式 輸入一個整數 n(2≤n≤60)&#xff0…

Chrome keyboard shortcuts

2019獨角獸企業重金招聘Python工程師標準>>> Chrome keyboard shortcuts https://support.google.com/chrome/answer/157179?hlen 轉載于:https://my.oschina.net/qwfys200/blog/1927456

數據中心細節_當細節很重要時數據不平衡

數據中心細節定義不平衡數據 (Definition Imbalanced Data) When we speak of imbalanced data, what we mean is that at least one class is underrepresented. For example, when considering the problem of building a classifier, let’s call it the Idealisstic-Voter.…

辛普森悖論_所謂的辛普森悖論

辛普森悖論We all know the Simpsons family from Disneyland, but have you heard about the Simpson’s Paradox from statistic theory? This article will illustrate the definition of Simpson’s Paradox with an example, and show you how can it harm your statisti…

查看NVIDIA使用率工具目錄

2019獨角獸企業重金招聘Python工程師標準>>> C:\Program Files\NVIDIA Corporation\Display.NvContainer\NVDisplay.Container.exe 轉載于:https://my.oschina.net/u/2430809/blog/1927560

2043. 簡易銀行系統

2043. 簡易銀行系統 你的任務是為一個很受歡迎的銀行設計一款程序&#xff0c;以自動化執行所有傳入的交易&#xff08;轉賬&#xff0c;存款和取款&#xff09;。銀行共有 n 個賬戶&#xff0c;編號從 1 到 n 。每個賬號的初始余額存儲在一個下標從 0 開始的整數數組 balance…

余弦相似度和歐氏距離_歐氏距離和余弦相似度

余弦相似度和歐氏距離Photo by Markus Winkler on UnsplashMarkus Winkler在Unsplash上拍攝的照片 This is a quick and straight to the point introduction to Euclidean distance and cosine similarity with a focus on NLP.這是對歐氏距離和余弦相似度的快速而直接的介紹&…

bzoj2152 聰聰可可

題目描述 聰聰和可可是兄弟倆&#xff0c;他們倆經常為了一些瑣事打起來&#xff0c;例如家中只剩下最后一根冰棍而兩人都想吃、兩個人都想玩兒電腦&#xff08;可是他們家只有一臺電腦&#xff09;……遇到這種問題&#xff0c;一般情況下石頭剪刀布就好了&#xff0c;可是他們…

七、 面向對象(二)

匿名類對象 創建的類的對象是匿名的。當我們只需要一次調用類的對象時&#xff0c;我們就可以考慮使用匿名的方式創建類的對象。特點是創建的匿名類的對象只能夠調用一次&#xff01; package day007;//圓的面積 class circle {double radius;public double getArea() {// TODO…