天氣數據集2-應用RNN做天氣預測

二、用循環神經網絡做天氣(溫度)預測

本項目是基于Pytorch的 RNN&GRU模型，用于預測未來溫度

數據集: https://mp.weixin.qq.com/s/08BmF4RnnwQ-jX5s_ukDUA
項目代碼: https://github.com/disanda/b_code/tree/master/Weather_Prediction

模型本質是用于預測數據的時序關系
模型的輸入和輸出是”序列長度可變的”

以下是Pytorch的RNN輸入和輸出樣例

import torchinput_size = 10 #輸入數據的維度
output_size= 1  #輸出數據的維度
num_layers= 3 #有幾層rnnrnn_case = torch.nn.RNN(input_size, output_size, num_layers, batch_first=True)batch_size = 4 #模型可以批處理數據序列
seq_length1 = 5 #序列長度為5，即輸入一個序列的5個連續點
seq_length2 = 9 #序列長度為9，即輸入一個序列有9個連續點x1 = torch.randn(batch_size,seq_length1,input_size)
x2 = torch.randn(batch_size,seq_length2,input_size)h1_0 = torch.zeros(num_layers,batch_size,output_size)
h2_0 = torch.zeros(num_layers,batch_size,output_size)y1, h1_1 = rnn_case(x1,h1_0)  
# y1.shape = (batch_size, seq_length1, output_size)  
# h1_1.shape = (num_layers, batch_size, output_size) y2, h2_1 = rnn_case(x2,h2_0)
# y2.shape = (batch_size, seq_length1, output_size)  
# h2_1.shape = (num_layers, batch_size, output_size) #如果通過前n-1個數據預測第n個數據
y1_out = y1[:,-1,:]
y2_out = y2[:,-1,:]

數據預處理

2.1 輸出數據特征

pandas下是frame的列(columns)

import pandas as pd
import matplotlib.pyplot as pltcsv_path = "mpi_saale_2021b.csv"
data_frame = pd.read_csv(csv_path)
print(data_frame.columns)# Index(['Date Time', 'p (mbar)', 'T (degC)', 'rh (%)', 'sh (g/kg)', 'Tpot (K)',
#        'Tdew (degC)', 'VPmax (mbar)', 'VPact (mbar)', 'VPdef (mbar)',
#        'H2OC (mmol/mol)', 'rho (g/m**3)', 'wv (m/s)', 'wd (deg)', 'rain (mm)',
#        'SWDR (W/m**2)', 'SDUR (s)', 'TRAD (degC)', 'Rn (W/m**2)',
#        'ST002 (degC)', 'ST004 (degC)', 'ST008 (degC)', 'ST016 (degC)',
#        'ST032 (degC)', 'ST064 (degC)', 'ST128 (degC)', 'SM008 (%)',
#        'SM016 (%)', 'SM032 (%)', 'SM064 (%)', 'SM128 (%)'], dtype='object')

2.2 刪掉一些特征

data = df.drop(columns=['Date Time']) #去掉字符串特征# 去掉其他degC特征
deg_columns = df.filter(like='degC').columns 
filtered_list = [item for item in deg_columns if item != 'T (degC)']
data = data.drop(columns=pd.Index(filtered_list))
#print(data.columns)

2.3 數據標準化

可以把所有特征看成不同貨幣，完成貨幣的計量統一


# 標準化特征
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
#print(data_scaled)

2.4 序列化和Pytorch批處理

y = f(x)： x是輸入，y是輸出，f是模型
x序列化, 目的是一個數據單位是一個序列(n-1個數據，每個數據有n個特征)
制作y標簽，即預測值(第n個數據的”溫度”特征)


# 創建序列數據
X, y = [], []for i in range(len(data_scaled) - sequence_length):X.append(data_scaled[i:i+sequence_length-1])  # 前9個時間步的特征y.append(data_scaled[i+sequence_length-1, 1])  #  第10個時間步的第2個特征'T (degC)'作為目標X = np.array(X)
y = np.array(y)# 轉換為 PyTorch 張量
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  # 默認為 CPU
X = torch.tensor(X, dtype=torch.float32).to(device)
y = torch.tensor(y, dtype=torch.float32).to(device)# 劃分訓練集和測試集
dataset = TensorDataset(X, y)
train_size = int(0.8 * len(dataset))
#test_size = len(dataset) - train_size
#train_dataset, test_dataset = random_split(dataset, [train_size, test_size])train_dataset = TensorDataset(*dataset[:train_size])
test_dataset = TensorDataset(*dataset[train_size:])train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False, drop_last=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, drop_last=True)

超參數選擇

input_size = len(data.columns)
hidden_size = 64
output_size = 1
num_layers = 2
num_epochs = 30
learning_rate = 5e-5 #0.001
batch_size = 8
sequence_length = 8 # 輸入9個特征，預測第10個特征
model_type =‘RNN’ # GRU

模型

4.1 模型設計

單獨放一個文件夾，解耦程序


import torch.nn as nn
class WeatherRNN(nn.Module):def __init__(self, input_size, hidden_size, output_size, num_layers=1, model_type='RNN'):super(WeatherRNN, self).__init__()self.hidden_size = hidden_sizeself.num_layers = num_layersif model_type == 'RNN':self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) # batch_first=True, nonlinearity = 'relu'elif model_type == 'GRU':self.rnn = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)self.fc = nn.Linear(hidden_size, output_size)#self.dropout = nn.Dropout(p=0.2) 效果變差def forward(self, x, h0):out, hn = self.rnn(x, h0)#out = out[:, -1, :]#out = self.dropout(out)out = self.fc(out[:, -1, :])#out = torch.tanh(out)return out, hn

4.2 模型訓練

初始化: 1.模型，2.損失函數，3.優化器
訓練（前向傳播）：輸入輸出 y= f(x)
訓練（反向傳播）：輸入輸出 w’ = f’(x)


# 初始化模型、損失函數和優化器
model = models.WeatherRNN(input_size, hidden_size, output_size, num_layers, model_type = model_type).to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)# 訓練模型
model.train()
h0 = torch.zeros(num_layers, batch_size, hidden_size).to(device)
for epoch in range(num_epochs):for inputs, targets in train_loader:# print(inputs.shape)# print(targets.shape)# 訓練時每次輸入 sequence_length - 1 個數據，預測第 sequence_length 個數據output, hn = model(inputs, h0)   #inputs = [batch_size, sequence_length-1, features]h0 = hn.detach() # [layers, batch_size, hidden_size]#print(output.shape)#print(hn.shape)predictions = output[:, -1]loss = criterion(predictions, targets)optimizer.zero_grad()loss.backward() # retain_graph=Truemax_norm = 2.0  # 設定梯度的最大范數torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) # 使用clip_grad_norm_()函數控制梯度optimizer.step()print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.7f}')print("Training complete.")

4.3 模型評估


# 模型評估
model.eval()
test_loss = 0.0
h0 = torch.zeros(num_layers, batch_size, hidden_size).to(device)
with torch.no_grad():for inputs, targets in test_loader:#print(inputs.shape)#print(targets.shape)# 預測時每次輸入 sequence_length - 1 個數據，預測第 sequence_length 個數據output, hn = model(inputs, h0)h0 = hn.detach()predictions = output[:, -1]loss = criterion(predictions, targets)test_loss += loss.item()test_loss /= len(test_loader)
print(f'Test Loss: {test_loss:.7f}')

小結 & 參考鏈接

后續可以擴張到股票型數據

5.1 調參

通過看 tain_loss, test_loss 調參

Hidden_size，Layer_nums: 與數據規模成正比, 本例應適當調低
Learn_rate: 變化過快或震蕩可調低
Epoch: Loss 有效下降可以增大

5.2 技巧

梯度更新限制
max_norm = 2.0 # 設定梯度的最大范數
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) # 使用clip_grad_norm_()函數控制梯度
Dropout

適用于參數規模更大的RNN, 不適用本例

5.3 參考鏈接:

代碼： https://blog.paperspace.com/weather-forecast-using-ltsm-networks/
天氣數據集: https://www.bgc-jena.mpg.de/wetter/

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/23603.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/23603.shtml
英文地址，請注明出處：http://en.pswp.cn/web/23603.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！