基于門控循環單元 GRU 實現股票單變量時間序列預測(PyTorch版)

巴黎落日

前言

系列專欄:【深度學習：算法項目實戰】??
涉及醫療健康、財經金融、商業零售、食品飲料、運動健身、交通運輸、環境科學、社交媒體以及文本和圖像處理等諸多領域，討論了各種復雜的深度神經網絡思想，如卷積神經網絡、循環神經網絡、生成對抗網絡、門控循環單元、長短期記憶、自然語言處理、深度強化學習、大型語言模型和遷移學習。

近來，機器學習得到了長足的發展，并引起了廣泛的關注，其中語音和圖像識別領域的成果最為顯著。本文分析了深度學習模型——堆疊門控循環單元 Stacked GRU 在股市的表現。論文顯示，雖然這種技術在自然語言處理、語音識別等其他領域取得了不錯的成績，但在金融時間序列預測上卻表現不佳。事實上，金融數據的特點是噪聲信號比高，這使得機器學習模型難以找到模式并預測未來價格。

本文通過對 GRU 時間序列模型的介紹，探討Stacked GRU在股市科技股中的表現。本研究文章的結構如下。第一節介紹金融時間序列數據。第二節對金融時間數進行特征工程。第三節是構建模型、定義參數空間、損失函數與優化器。第四節是訓練模型。第五節是評估模型與結果可視化。第六部分是預測下一個時間點的收盤價。

GRU 單變量時間序列預測

1. 金融時間序列數據
- 1.1 數據預處理
- 1.2 探索性分析（可視化）
- - 1.2.1 股票的日收盤價
  - 1.2.2 股票的日收益率
  - 1.2.3 股票收益率自相關性
2. 時間數據特征工程(APPL)
- 2.1 構造序列數據
- 2.2 特征縮放（歸一化）
- 2.3 數據集劃分（TimeSeriesSplit）
- 2.4 數據集張量（TensorDataset）
3. 構建時間序列模型（Stacked GRU）
- 3.1 構建 GRU 模型
- 3.2 定義模型、損失函數與優化器
4. 模型訓練與可視化
5. 模型評估與可視化
- 5.1 均方誤差
- 5.2 反歸一化
- 5.3 結果可視化
6. 模型預測
- 6.1 轉換最新時間步收盤價的數組為張量
- 6.2 預測下一個時間點的收盤價格

1. 金融時間序列數據

金融時間序列數據是指按照時間順序記錄的各種金融指標的數值序列，這些指標包括但不限于股票價格、匯率、利率等。這些數據具有以下幾個顯著特點：

時間連續性：數據按照時間的先后順序排列，反映了金融市場的動態變化過程。
噪聲和不確定性：金融市場受到多種復雜因素的影響，因此數據中存在大量噪聲和不確定性。
非線性和非平穩性：金融時間序列數據通常呈現出明顯的非線性和非平穩性特征。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsfrom sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import TimeSeriesSplitimport torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from torchinfo import summary
from tqdm import tqdm

1.1 數據預處理

pandas.to_datetime 函數將標量、數組、Series 或 DataFrame/dict-like 轉換為 pandas datetime 對象。

AAPL = pd.read_csv('AAPL.csv')
print(type(AAPL['Close'].iloc[0]),type(AAPL['Date'].iloc[0]))
# Let's convert the data type of timestamp column to datatime format
AAPL['Date'] = pd.to_datetime(AAPL['Date'])
print(type(AAPL['Close'].iloc[0]),type(AAPL['Date'].iloc[0]))# Selecting subset
cond_1 = AAPL['Date'] >= '2021-04-23 00:00:00'
cond_2 = AAPL['Date'] <= '2024-04-23 00:00:00'
AAPL = AAPL[cond_1 & cond_2].set_index('Date')
print(AAPL.shape)

<class 'numpy.float64'> <class 'str'>
<class 'numpy.float64'> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
(755, 6)

1.2 探索性分析（可視化）

探索性數據分析 $E D A$ 是一種使用視覺技術分析數據的方法。它用于發現趨勢和模式，或借助統計摘要和圖形表示來檢查假設。

1.2.1 股票的日收盤價

收盤價是股票在正常交易日交易的最后價格。股票的收盤價是投資者用來跟蹤其長期表現的標準基準。

# plt.style.available
plt.style.use('seaborn-v0_8')

# 繪制收盤價
plt.figure(figsize=(18, 6))
plt.plot(AAPL['Adj Close'], label='AAPL')# 設置圖表標題和軸標簽
plt.title('Close Price with Moving Averages')
plt.xlabel('')
plt.ylabel('Price $', fontsize=18)# 顯示圖例
plt.legend()
plt.show()

請添加圖片描述

1.2.2 股票的日收益率

股票的日收益率是反映投資者在一天內從股票投資中獲得的回報比例。它通常用百分比來表示，計算公式為：日收益率 = (今日收盤價 - 前一日收盤價) / 前一日收盤價 × 100%，這里我們可是使用 .pct_change() 函數來實現。

plt.figure(figsize=(18,6))
plt.title('Daily Return History')
plt.plot(AAPL['Adj Close'].pct_change(),linestyle='--',marker='*',label='AAPL')
plt.ylabel('Daily Return', fontsize=18)
plt.legend()
plt.show()

請添加圖片描述

1.2.3 股票收益率自相關性

股票收益率自相關性是描述一個股票在不同時間點的收益率如何相互關聯的一個概念。具體來說，它指的是一個股票過去的收益率與其未來收益率之間的相關性。這種相關性可以是正相關（即過去的收益率上升預示著未來的收益率也可能上升），也可以是負相關（即過去的收益率上升預示著未來的收益率可能下降），或者兩者之間沒有顯著的相關性。

AAPL['Returns'] = AAPL['Adj Close'].pct_change()# 使用pandas的autocorr函數計算自相關系數
# 注意：autocorr默認計算的是滯后1的自相關系數，要計算其他滯后的，需要循環或使用其他方法
autocorr_values = [AAPL['Returns'].autocorr(lag=i) for i in range(1, 301)]  # 假設我們查看滯后1到300的自相關# 使用matplotlib繪制自相關系數
plt.figure(figsize=(18, 6))
plt.plot(range(1, 301), autocorr_values, linestyle='-.', marker='*')
plt.title('Autocorrelation of Stock Returns')
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.grid(True)
plt.show()

請添加圖片描述

2. 時間數據特征工程(APPL)

在時間序列分析中，時間窗口通常用于描述在訓練模型時考慮的連續時間步 time steps 的數量。這個時間窗口的大小，即 window_size，對于模型預測的準確性至關重要。

具體來說，window_size 決定了模型在做出預測時所使用的歷史數據的長度。例如，如果我們想要用前60天的股票數據來預測未來7天的收盤價，那么window_size 就是60。

# 設置時間窗口大小
window_size = 60

2.1 構造序列數據

該函數需要兩個參數：dataset 和 lookback，前者是要轉換成數據集的 NumPy 數組，后者是用作預測下一個時間段的輸入變量的前一時間步數，默認設為 1。

# 構造序列數據函數
def create_dataset(dataset, lookback=1):"""Transform a time series into a prediction datasetArgs:dataset: A numpy array of time series, first dimension is the time stepslookback: Size of window for prediction"""X, y = [], []for i in range(len(dataset)-lookback): feature = dataset[i:(i+lookback), 0]target = dataset[i + lookback, 0]X.append(feature)y.append(target)return np.array(X), np.array(y)

2.2 特征縮放（歸一化）

MinMaxScaler() 函數主要用于將特征數據按比例縮放到指定的范圍。默認情況下，它將數據縮放到[0, 1]區間內，但也可以通過參數設置將數據縮放到其他范圍。在機器學習中，MinMaxScaler()函數常用于不同尺度特征數據的標準化，以提高模型的泛化能力。

# 選取AAPL['Close']作為特征, 歸一化數據
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(AAPL['Close'].values.reshape(-1, 1))

# 創建數據集
X, y = create_dataset(scaled_data, lookback=window_size)

# 重塑輸入數據為[samples, time steps, features]
X = np.reshape(X, (X.shape[0], X.shape[1], 1))

2.3 數據集劃分（TimeSeriesSplit）

TimeSeriesSplit() 函數與傳統的交叉驗證方法不同，TimeSeriesSplit 特別適用于需要考慮時間順序的數據集，因為它確保測試集中的所有數據點都在訓練集數據點之后，并且可以分割多個訓練集和測試集。

# 使用TimeSeriesSplit劃分數據集，根據需要調整n_splits
tscv = TimeSeriesSplit(n_splits=3, test_size=90)
# 遍歷所有劃分進行交叉驗證
for i, (train_index, test_index) in enumerate(tscv.split(X)):X_train, X_test = X[train_index], X[test_index]y_train, y_test = y[train_index], y[test_index]# print(f"Fold {i}:")# print(f"  Train: index={train_index}")# print(f"  Test:  index={test_index}")# 查看最后一個 fold 數據幀的維度
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(605, 60, 1) (90, 60, 1) (605,) (90,)

2.4 數據集張量（TensorDataset）

張量是一個多維數組或矩陣的數學對象，可以看作是向量和矩陣的推廣。在深度學習中，張量通常用于表示輸入數據、模型參數以及輸出數據

# 將 NumPy數組轉換為 tensor張量
X_train_tensor = torch.from_numpy(X_train).type(torch.Tensor)
X_test_tensor = torch.from_numpy(X_test).type(torch.Tensor)
y_train_tensor = torch.from_numpy(y_train).type(torch.Tensor).view(-1,1)
y_test_tensor = torch.from_numpy(y_test).type(torch.Tensor).view(-1,1)print(X_train_tensor.shape, X_test_tensor.shape, y_train_tensor.shape, y_test_tensor.shape)

view() 函數用于重塑張量對象，它等同于 NumPy 中的 reshape() 函數，允許我們重組數據，以匹配 LSTM 模型所需的輸入形狀。以這種方式重塑數據可確保 LSTM 模型以預期格式接收數據。

torch.Size([605, 60, 1]) torch.Size([90, 60, 1]) torch.Size([605, 1]) torch.Size([90, 1])

使用 TensorDataset 和 DataLoader創建數據集和數據加載器

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

shuffle=True 表示在每個epoch開始時，數據集將被隨機打亂，這有助于防止模型在訓練時過擬合。與訓練數據加載器類似，shuffle=False 表示在測試時不需要打亂數據集。因為測試集通常用于評估模型的性能，而不是用于訓練，所以不需要打亂。

3. 構建時間序列模型（Stacked GRU）

GRU (Gated Recurrent Unit)是一種循環神經網絡 $RNN$ 的變體，用于處理和預測序列數據。與標準RNN相比，GRU能夠更有效地捕捉長期依賴關系，并且在訓練時更不容易出現梯度消失或梯度爆炸的問題。

🔗 PyTorch所提供的數學公式及解釋如下：

Apply a multi-layer gated recurrent unit (GRU) RNN to an input sequence. For each element in the input sequence, each layer computes the following function:
$\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t \odot (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) \odot n_t + z_t \odot h_{(t-1)} \end{array}$
where $h_t$ is the hidden state at time $t$ , $x_t$ is the input at time $t$ , $h_{(t-1)}$ is the hidden state of the layer at time $t ? 1$ or the initial hidden state at time $0$ , and $r_t$ , $z_t$ , $n_t$ are the reset, update, and new gates, respectively. $\sigma$ is the sigmoid function, and $\odot$ is the Hadamard product.

In a multilayer GRU, the input $x^{(l)}_t$ of the $l$ -th layer ( $\ge 2$ ) is the hidden state $h^{(l-1)}_t$ of the previous layer multiplied by dropout $\delta^{(l-1)}_t$ where each $\delta^{(l-1)}_t$ is a Bernoulli random variable which is $0$ with probability $d ro p o u t$ .

3.1 構建 GRU 模型

class GRUNet(nn.Module):def __init__(self, input_dim, hidden_dim, output_dim=1, num_layers=2):# input_dim 是輸入特征的維度，hidden_dim 是隱藏層神經單元維度或稱為隱藏狀態的大小，output_dim 是輸出維度，# num_layers 是網絡層數，設置 num_layers=2 表示將兩個 GRU 堆疊在一起形成一個堆疊 GRU，第二個 GRU 接收第一個 GRU 的輸出并計算最終結果super(GRUNet, self).__init__()# 通過調用 super(GRUNet, self).__init__() 初始化父類 nn.Moduleself.hidden_dim = hidden_dimself.num_layers = num_layersself.gru = nn.GRU(input_dim, hidden_dim, num_layers, batch_first=True)# 定義 GRU 層，使用 batch_first=True 表示輸入數據的形狀是 [batch_size, seq_len(time_steps), input_dim]self.fc = nn.Linear(hidden_dim, output_dim)# 定義全連接層，將 GRU 的最后一個隱藏狀態映射到輸出維度 output_dimdef forward(self, x):h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).to(x.device)# 初始化h0為全零張量，h0代表隱藏狀態(hidden state)的初始值，形狀為 [num_layers * num_directions, batch, hidden_dim]# 如果沒有指定雙向參數 bidirectional為 True，num_directions 默認為 1out, _ = self.gru(x, h0)# 將輸入數據 x 和初始隱藏狀態 h0 傳遞給 GRU層，得到輸出 out(即所有時間步的輸出)和最后一個時間步的隱藏狀態 hn(這里用 _忽略)out = self.fc(out[:, -1, :])# GRU 的輸出是一個三維張量，其形狀是 [batch_size, seq_len(time_steps), hidden_dim]，# 這里我們只取最后一個時間步的隱藏狀態 out[:, -1, :] 并傳遞給全連接層。return out

3.2 定義模型、損失函數與優化器

要在 PyTorch 中構建堆疊 GRU，我們需要調用 GRUNet 類，通過輸入 num_layers 的參數來實現

model = GRUNet(input_dim=1, # 輸入數據的特征數量 X_train.shape[2]hidden_dim=64,output_dim=1,num_layers=2) # 表示將兩個 GRU 堆疊在一起形成一個堆疊 GRU
criterion = torch.nn.MSELoss() # 定義均方誤差損失函數
optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # 定義優化器

summary(model, (32, 60, 1)) # batch_size, seq_len(time_steps), input_dim

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
GRUNet                                   [32, 1]                   --
├─GRU: 1-1                               [32, 60, 64]              37,824
├─Linear: 1-2                            [32, 1]                   65
==========================================================================================
Total params: 37,889
Trainable params: 37,889
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 72.62
==========================================================================================
Input size (MB): 0.01
Forward/backward pass size (MB): 0.98
Params size (MB): 0.15
Estimated Total Size (MB): 1.14
==========================================================================================

4. 模型訓練與可視化

train_loss = []
num_epochs = 20for epoch in range(num_epochs):model.train()  # 初始化訓練進程pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}")for batch_idx, (data, target) in enumerate(pbar):# 前向傳播outputs = model(data)  # 每個批次的預測值loss = criterion(outputs, target)# 反向傳播和優化optimizer.zero_grad()loss.backward()optimizer.step()# 記錄損失值train_loss.append(loss.item())# 更新進度條pbar.update()# 這里只用于顯示當前批次的損失，不是平均損失pbar.set_postfix({'Train loss': f'{loss.item():.4f}'})

這里我們使用 tqdm模塊來展示進度條

Epoch 1/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.25it/s, Train loss=0.0211]
Epoch 2/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.07it/s, Train loss=0.0052]
Epoch 3/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.24it/s, Train loss=0.0030]
Epoch 4/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.06it/s, Train loss=0.0014]
Epoch 5/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.19it/s, Train loss=0.0011]
Epoch 6/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.21it/s, Train loss=0.0007]
Epoch 7/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.21it/s, Train loss=0.0015]
Epoch 8/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.22it/s, Train loss=0.0015]
Epoch 9/20: 100%|███████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.21it/s, Train loss=0.0012]
Epoch 10/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.07it/s, Train loss=0.0010]
Epoch 11/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.07it/s, Train loss=0.0014]
Epoch 12/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.19it/s, Train loss=0.0011]
Epoch 13/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.21it/s, Train loss=0.0013]
Epoch 14/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.07it/s, Train loss=0.0013]
Epoch 15/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.03it/s, Train loss=0.0020]
Epoch 16/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.08it/s, Train loss=0.0012]
Epoch 17/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.07it/s, Train loss=0.0009]
Epoch 18/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.22it/s, Train loss=0.0014]
Epoch 19/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.07it/s, Train loss=0.0019]
Epoch 20/20: 100%|██████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.11it/s, Train loss=0.0013]

plt.plot(train_loss)

請添加圖片描述

5. 模型評估與可視化

5.1 均方誤差

model.eval()  # 將模型設置為評估模式
test_loss = []  # 初始化損失
pbar = tqdm(test_loader, desc="Evaluating")
with torch.no_grad():for data, target in pbar:test_pred = model(data)loss = criterion(test_pred, target)test_loss.append(loss.item())# 計算當前批次的平均損失batch_avg_loss = sum(test_loss)/len(test_loss)pbar.set_postfix({'Test Loss': f'{batch_avg_loss:.4f}'})pbar.update()  # 更新進度條pbar.close()  # 關閉進度條

Evaluating: 100%|██████████████████████████████████████████████████████| 3/3 [00:00<00:00, 51.30it/s, Test Loss=0.0011]

5.2 反歸一化

.inverse_transform 將經過轉換或縮放的數據轉換回其原始形式或接近原始形式

# 反歸一化預測結果
train_pred = scaler.inverse_transform(model(X_train_tensor).detach().numpy())
y_train = scaler.inverse_transform(y_train_tensor.detach().numpy())
test_pred = scaler.inverse_transform(model(X_test_tensor).detach().numpy())
y_test = scaler.inverse_transform(y_test_tensor.detach().numpy())print(train_pred.shape, y_train.shape, test_pred.shape, y_test.shape)

(605, 1) (605, 1) (90, 1) (90, 1)

5.3 結果可視化

計算訓練預測與測試預測的繪圖數據

# shift train predictions for plotting
trainPredict = AAPL[window_size:X_train.shape[0]+X_train.shape[1]]
trainPredictPlot = trainPredict.assign(TrainPrediction=train_pred)testPredict = AAPL[X_train.shape[0]+X_train.shape[1]:]
testPredictPlot = testPredict.assign(TestPrediction=test_pred)

繪制模型收盤價格的原始數據與預測數據

# Visualize the data
plt.figure(figsize=(18,6))
plt.title('GRU Close Price Validation')
plt.plot(AAPL['Close'], color='blue', label='original')
plt.plot(trainPredictPlot['TrainPrediction'], color='orange',label='Train Prediction')
plt.plot(testPredictPlot['TestPrediction'], color='red', label='Test Prediction')
plt.legend()
plt.show()

請添加圖片描述

6. 模型預測

6.1 轉換最新時間步收盤價的數組為張量

# 假設latest_closes是一個包含最新window_size個收盤價的列表或數組
latest_closes = AAPL['Close'][-window_size:].values
latest_closes = latest_closes.reshape(-1, 1)
scaled_latest_closes = scaler.fit_transform(latest_closes)
tensor_latest_closes = torch.from_numpy(scaled_latest_closes).type(torch.Tensor).view(1, window_size, 1)
print(tensor_latest_closes.shape)

torch.Size([1, 60, 1])

6.2 預測下一個時間點的收盤價格

# 使用模型預測下一個時間點的收盤價
next_close_pred = model(tensor_latest_closes)
next_close_pred = scaler.inverse_transform(next_close_pred.detach().numpy())
next_close_pred