c++ 時間序列工具包_我的時間序列工具包

c++ 時間序列工具包

When it comes to time series forecasting, I’m a great believer that the simpler the model, the better.

關于時間序列預測,我堅信模型越簡單越好。

However, not all time series are created equal. Some time series have a strongly defined trend — we often see this with economic data, for instance:

但是,并非所有時間序列都是相同的。 某些時間序列具有明確定義的趨勢-例如,我們經常在經濟數據中看到這一趨勢:

Others show a more stationary-like pattern — e.g. monthly air passenger numbers:

其他人則表現出更平穩的模式,例如每月的航空旅客人數:

Image for post
Source: San Francisco Open Data
資料來源:舊金山開放數據

The choice of time series model will depend highly on the type of time series one is working with. Here are some of the most useful time series models I’ve encountered.

時間序列模型的選擇將在很大程度上取決于正在使用的時間序列的類型。 這是我遇到的一些最有用的時間序列模型。

1. ARIMA (1. ARIMA)

In my experience, ARIMA tends to be most useful when modelling time series with a strong trend. The model is also adept at modelling seasonality patterns.

以我的經驗,當對具有強烈趨勢的時間序列進行建模時,ARIMA往往最有用。 該模型還擅長對季節性模式進行建模。

Let’s take an example.

讓我們舉個例子。

Suppose we wish to model monthly air passenger numbers over a period of years. The original data is sourced from San Francisco Open Data.

假設我們希望對幾年內的每月航空旅客數量進行建模。 原始數據來自San Francisco Open Data 。

Such a time series will have a seasonal component (holiday seasons tend to have higher passenger numbers, for instance) as well as evidence of a trend as indicated when the series is decomposed as below.

這樣的時間序列將具有季節性成分(例如,假日季節往往會有更高的乘客人數),以及當序列分解如下時所指示的趨勢的證據。

Image for post
Source: RStudio
資料來源:RStudio

The purpose of using an ARIMA model is to capture the trend as well as account for the seasonality inherent in the time series.

使用ARIMA模型的目的是捕獲趨勢并考慮時間序列固有的季節性。

To do this, one can use the auto.arima function in R, which can select the best fit p, d, q coordinates for the model as well as the appropriate seasonal component.

為此,可以使用R中的auto.arima函數,該函數可以為模型選擇最佳擬合的p,d,q坐標以及適當的季節分量。

For the above example, the model that performed best in terms of the lowest BIC was as follows:

對于上面的示例,就最低BIC而言表現最佳的模型如下:

Series: passengernumbers 
ARIMA(1,0,0)(0,1,1)[12]Coefficients:
ar1 sma1
0.7794 -0.5001
s.e. 0.0609 0.0840sigma^2 estimated as 585834: log likelihood=-831.27
AIC=1668.54 AICc=1668.78 BIC=1676.44

Here is a visual of the forecasts.

這是預測的視覺效果。

Image for post
Source: RStudio
資料來源:RStudio

We can see that ARIMA is adequately forecasting the seasonal pattern in the series. In terms of the model performance, the RMSE (root mean squared error) and MFE (mean forecast error) were as follows:

我們可以看到ARIMA可以充分預測該系列的季節性模式。 在模型性能方面,RMSE(均方根誤差)和MFE(平均預測誤差)如下:

  • RMSE: 698

    RMSE: 698

  • MFE: -115

    MFE: -115

Given a mean of 8,799 passengers per month across the validation set, the errors recorded were quite small in comparison to the average — indicating that the model is performing well in forecasting air passenger numbers.

假設整個驗證集中平均每月有8799名乘客,則記錄的誤差與平均值相比很小,這表明該模型在預測航空乘客人數方面表現良好。

2.先知 (2. Prophet)

Let’s take a look at the air passenger example once again, but this time using Facebook’s Prophet. Prophet is a time series tool that allows for forecasting bsaed on an additive model, and works especially well with data that has strong seasonal trends.

讓我們再來看一次航空乘客示例,但這一次使用Facebook的Prophet 。 Prophet是一個時間序列工具,可用于根據加性模型進行預測,尤其適用于季節性趨勢強烈的數據。

The air passenger dataset appears to fit the bill, so let’s see how the model would perform compared to ARIMA.

航空乘客數據集似乎符合要求,因此讓我們看看與ARIMA相比該模型的性能如何。

In this example, Prophet can be used to identify the long-term trend for air passenger numbers, as well as seasonal fluctuations throughout the year:

在此示例中,可以使用先知來確定航空客運量的長期趨勢以及全年的季節性波動:

Image for post
Source: Jupyter Notebook Output
資料來源:Jupyter Notebook輸出
prophet_basic = Prophet()
prophet_basic.fit(train_dataset)

A standard Prophet model can be fit to pick up the trend and seasonal components automatically, although these can also be configured manually by the user.

盡管可以由用戶手動配置,但標準的Prophet模型可以適合自動獲取趨勢和季節成分。

One particularly useful component of Prophet is the inclusion of changepoints, or significant structural breaks in a time series.

先知的一個特別有用的組成部分是包含變更點 ,即時間序列中的重大結構中斷。

Image for post
Source: Jupyter Notebook Output
資料來源:Jupyter Notebook輸出

Through trial and error, 4 changepoints were shown to minimise the MFE and RMSE:

通過反復試驗,顯示了4個更改點以最大程度地減少MFE和RMSE:

pro_change= Prophet(n_changepoints=4)
forecast = pro_change.fit(train_dataset).predict(future)
fig= pro_change.plot(forecast);
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)

The RMSE and MAE can now be calculated as follows:

現在可以按以下方式計算RMSE和MAE:

>>> from sklearn.metrics import mean_squared_error
>>> from math import sqrt
>>> mse = mean_squared_error(passenger_test, yhat14)
>>> rmse = sqrt(mse)
>>> print('RMSE: %f' % rmse)RMSE: 524.263928>>> forecast_error = (passenger_test-yhat14)
>>> forecast_error
>>> mean_forecast_error = np.mean(forecast_error)
>>> mean_forecast_error71.58326743881493

The RMSE and MFE for Prophet are both lower than that obtained using ARIMA, suggesting that the model has performed better in forecasting monthly air passenger numbers.

先知的RMSE和MFE均低于使用ARIMA獲得的值,這表明該模型在預測每月航空乘客人數方面表現更好。

3. TensorFlow概率 (3. TensorFlow Probability)

In the aftermath of COVID-19, many time series forecasts have proven to be erroneous as they have been made with the wrong set of assumptions.

在COVID-19之后,許多時間序列的預測被證明是錯誤的,因為它們是用錯誤的假設集做出的。

Increasingly, it is coming to be recognised that time series models which can produce a range of forecasts can be more practically applied, as they allow for a “scenario analysis” of what might happen in the future.

人們越來越認識到,可以產生一系列預測的時間序列模型可以更實際地應用,因為它們可以對未來可能發生的情況進行“情景分析”。

As an example, an ARIMA model built using the air passenger data as above could not have possibly forecasted the sharp drop in passenger numbers that came about as a result of COVID-19.

例如,使用上述航空旅客數據構建的ARIMA模型可能無法預測由于COVID-19而導致的旅客人數急劇下降。

However, using more recent air passenger data, let’s see how a model built using TensorFlow Probability would have performed:

但是,使用最近的航空乘客數據,讓我們看看使用TensorFlow Probability構建的模型將如何執行:

Image for post
Source: TensorFlow Probability
資料來源:TensorFlow概率

While the model would not have forecasted the sharp drop that ultimately came to pass, we do see that the model is forecasting a drop in passenger numbers to below 150,000. Use of this model can allow for more of a “what-if” series of forecasts — e.g. an airline could forecast monthly passenger numbers for a particular airport and note that passenger numbers could be significantly lower than usual — which could inform the company in terms of managing resources such as fleet utilisation, for instance.

盡管該模型無法預測最終會發生的急劇下降,但我們確實看到該模型預測的乘客人數將下降到150,000以下。 使用此模型可以進行更多的“假設分析”系列預測-例如,航空公司可以預測特定機場的每月乘客人數,并請注意,乘客人數可能大大低于平時-這可以向公司傳達例如,管理資源,例如車隊利用。

Specifically, TensorFlow Probability makes forecasts using the assumption of a posterior distribution — which is comprised of a prior distribution (prior data) and the likelihood function.

具體來說,TensorFlow概率使用后驗分布的假設進行預測,該后驗分布由先驗分布(先驗數據)和似然函數組成。

Image for post
Source: Image Created by Author
資料來源:作者創作的圖片

For reference, the example illustrated here uses the template from the Structural Time Series modeling in TensorFlow Probability tutorial, of which the original authors (Copyright 2019 The TensorFlow Authors) have made available under the Apache 2.0 license.

作為參考,此處顯示的示例使用TensorFlow概率教程中的結構時間序列建模中的模板,該原始模板的作者(Copyright 2019 The TensorFlow Authors)已獲得Apache 2.0許可。

結論 (Conclusion)

Time series analysis is about making reliable forecasts using models suited to the data in question. For data with defined trend and seasonal components, it has been my experience that these models work quite well.

時間序列分析是關于使用適用于相關數據的模型進行可靠的預測。 對于具有定義的趨勢和季節性成分的數據,根據我的經驗,這些模型非常有效。

Hope you found the above article of use, and feel free to leave any questions or feedback in the comments section.

希望您找到了上面的使用文章,并隨時在評論部分中留下任何問題或反饋。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免責聲明:本文按“原樣”撰寫,不作任何擔保。 它旨在提供數據科學概念的概述,并且不應以任何方式解釋為專業建議。

翻譯自: https://towardsdatascience.com/my-time-series-toolkit-4aa841d08325

c++ 時間序列工具包

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/391159.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/391159.shtml
英文地址,請注明出處:http://en.pswp.cn/news/391159.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

bash 的相關配置

bash 參數自動補全 請安裝 bash-completion bash 提示符 說明:參考文檔 1. 簡潔風格 if [[ ${EUID} 0 ]] ; then PS1\[\033[01;32m\][\[\033[01;35m\]\u\[\033[01;37m\] \w\[\033[01;32m\]]\$\[\033[00m\] else PS1\[\033[01;32m\][\u\[\033[01;37m\] \w\[\033[01;…

LINUX系統安裝和管理

目錄 一.應用程序 對比應用程序與系統命令的關系 典型應用程序的目錄結構 常見的軟件包裝類型 二.RPM軟件包管理 1.RPM是什么? 2.RPM命令的格式 查看已安裝的軟件包格式 查看未安裝的軟件包 3.RPM安裝包從哪里來? 4.掛載的定義 掛載命令moun…

sql基礎教程亞馬遜_針對Amazon,Apple,Google的常見SQL面試問題

sql基礎教程亞馬遜SQL is used in a wide variety of programming jobs. Its important to be familiar with SQL if you are going to be interviewing soon for a software position. This is especially true if you are going to interview at a top tech company such as …

leetcode 1720. 解碼異或后的數組(位運算)

未知 整數數組 arr 由 n 個非負整數組成。 經編碼后變為長度為 n - 1 的另一個整數數組 encoded ,其中 encoded[i] arr[i] XOR arr[i 1] 。例如,arr [1,0,2,1] 經編碼后得到 encoded [1,2,3] 。 給你編碼后的數組 encoded 和原數組 arr 的第一個元…

adobe 書簽怎么設置_讓我們設置一些規則…沒有Adobe Analytics處理規則

adobe 書簽怎么設置Originally published at Analyst Admin.最初發布于Analyst Admin 。 In my experience working with Adobe Analytics, I’ve found that Processing Rules help in some cases, but oftentimes they create more work. I try to avoid using Processing R…

詳解linux下安裝python3環境

1、下載python3.5源碼包首先去python官網下載python3的源碼包,網址:https://www.python.org/ 進去之后點擊導航欄的Downloads,也可以鼠標放到Downloads上彈出菜單選擇Source code,表示源碼包,這里選擇最新版本3.5.2&am…

詳解spl_autoload_register()函數

http://blog.csdn.net/panpan639944806/article/details/23192267 轉載于:https://www.cnblogs.com/maidongdong/p/7647163.html

上海區塊鏈會議演講ppt_所以您想參加會議演講嗎? 這是我的建議。

上海區塊鏈會議演講pptYou’ve attended a few conferences, watched a lot of presentations, and decided it’s time to give a talk of your own! As someone who has both given talks at conferences, and sat on the proposal review board for others, I’m here to te…

重學TCP協議(8) TCP的11種狀態

TCP的11種狀態 為了邏輯更加清晰,假設主動打開連接和關閉連接皆為客戶端,被動打開連接和關閉連接皆為服務端 客戶端獨有的:(1)SYN_SENT (2)FIN_WAIT1 (3)FIN_WAIT2 &…

肯尼亞第三方支付_肯尼亞的COVID-19病例正在Swift增加,我們不知道為什么。

肯尼亞第三方支付COVID-19 cases in Kenya are accelerating rapidly. New cases have increased 300% month-over-month since April of this year while global and regional media have reported on the economic toll of stringent lock-down measures and heavy-handed go…

JVM命令

1、jps 查看所有虛擬機進程 -v 虛擬機啟動時JVM參數 -l 執行主類全名 2、jstat 顯示本地或遠程類裝載、內存、垃圾回收、JIT編譯等運行數據(性能問題定位工具首選) 格式:jstat [-option] vmid ms count (示例&a…

Java 集合 List Arrays.asList

2019獨角獸企業重金招聘Python工程師標準>>> 參考鏈接:阿里巴巴Java開發手冊終極版v1.3.0 【強制】使用工具類 Arrays.asList()把數組轉換成集合時,不能使用其修改集合相關的方 法,它的 add/remove/clear 方法會拋出 UnsupportedO…

重學TCP協議(9) 半連接隊列、全連接隊列

1. 半連接隊列、全連接隊列基本概念 三次握手中,在第一步server收到client的syn后,把相關信息放到半連接隊列中,同時回復synack給client(第二步),同時開啟一個定時器,如果超時還未收到 ACK 會進…

分類預測回歸預測_我們應該如何匯總分類預測?

分類預測回歸預測If you are reading this, then you probably tried to predict who will survive the Titanic shipwreck. This Kaggle competition is a canonical example of machine learning, and a right of passage for any aspiring data scientist. What if instead …

【CZY選講·Yjq的棺材】

題目描述? Yjq想要將一個長為寬為的矩形棺材(棺材表面絕對光滑,所以棺材可以任意的滑動)拖過一個L型墓道。? 如圖所示,L型墓道兩個走廊的寬度分別是和,呈90,并且走廊的長度遠大于。? 現在Hja想知道對于給…

“機器換人”之潮涌向珠三角,藍領工人將何去何從

企業表示很無奈,由于生產需要,并非刻意換人。 隨著傳統產業向更加現代化、自動化的新產業轉型,“機器換人”似乎是歷史上不可逆轉的潮流。 據報道,珠三角經濟圈所在的廣東省要從傳統的制造大省向制造強省轉變,企業轉型…

slack通知本地服務器_通過構建自己的Slack App學習無服務器

slack通知本地服務器Serverless architecture is the industrys latest buzzword and many of the largest tech companies have begun to embrace it. 無服務器架構是業界最新的流行語,許多大型科技公司已開始采用它。 In this article, well learn what it is an…

深入理解InnoDB(6)—獨立表空間

InnoDB的表空間 表空間可以看做是InnoDB存儲引擎邏輯結構的最高層 ,所有的數據都是存放在表空間中。 1. Extent 對于16KB的頁來說,連續的64個頁就是一個區,也就是說一個區默認占用1MB空間大小。 每256個區被劃分成一組,第一組的前3個頁面是…

神經網絡推理_分析神經網絡推理性能的新工具

神經網絡推理Measuring the inference time of a trained deep neural model on different hardware devices is a critical task when making deployment decisions. Should you deploy your inference on 8 Nvidia V100s, on 12 P100s, or perhaps you can use 64 CPU cores?…

Eclipse斷點調試

1.1 Eclipse斷點調試概述Eclipse的斷點調試可以查看程序的執行流程和解決程序中的bug1.2 Eclipse斷點調試常用操作:A:什么是斷點:就是一個標記,從哪里開始。B:如何設置斷點:你想看哪里的程序,你就在那個有效程序的左邊雙擊即可。C…