三十五周學習周報

摘要

在本周閱讀的文獻中，作者提出了一種創新的水文時間序列預測模型，其通過將粒子群優化（PSO）與Bi-LSTM和Bi-GRU相結合，并融入特征融合和自注意力層，顯著提升了預測能力。該模型利用 PSO 優化超參數（如隱藏單元數），高效探索搜索空間以適應多樣化的時序特性。Bi-LSTM 和 Bi-GRU 的雙向結構能夠同時捕獲過去和未來的依賴關系，增強了對復雜時間模式的建模能力，而自注意力層通過動態加權機制突出關鍵時間步的特征，進一步提升了特征表達的精準性。多模態融合策略整合了 Bi-LSTM 和 Bi-GRU 的互補優勢，結合注意力機制實現了加權特征的高效提取，從而在預測精度和魯棒性上超越傳統單向模型。這一架構設計不僅計算效率高，且適應性強，為水文預測任務提供了一種靈活而強大的解決方案。

abstract

In the paper read this week, the author proposed an innovative hydrological time series prediction model that significantly improves prediction ability by combining particle swarm optimization (PSO) with Bi-LSTM and Bi-GRU, and incorporating feature fusion and self attention layers. This model utilizes PSO to optimize hyperparameters (such as the number of hidden units) and efficiently explore the search space to adapt to diverse temporal characteristics. The bidirectional structure of Bi-LSTM and Bi-GRU can simultaneously capture past and future dependencies, enhancing the modeling ability of complex time patterns. The self attention layer highlights the features of key time steps through a dynamic weighting mechanism, further improving the accuracy of feature expression. The multimodal fusion strategy integrates the complementary advantages of Bi-LSTM and Bi-GRU, and combines attention mechanism to achieve efficient extraction of weighted features, surpassing traditional unidirectional models in prediction accuracy and robustness. This architecture design not only has high computational efficiency but also strong adaptability, providing a flexible and powerful solution for hydrological prediction tasks.

文獻閱讀

本周閱讀了一篇名為Multimodal Fusion of Optimized GRU–LSTM with Self?Attention Layer for Hydrological Time Series Forecasting的論文。
論文地址：Multimodal Fusion of Optimized GRU–LSTM with Self?Attention Layer for Hydrological Time Series Forecasting
在這里插入圖片描述
在論文中，作者提出了一種新型的融合模型。其結合粒子群優化（PSO）與Bi-LSTM和Bi-GRU創新方法，通過特征融合和注意力層增強了水文時間序列預測的準確性，并在多個數據集上取得了優于傳統方法的表現。

1.1相關知識

1.1.1 PSO

PSO算法最初是受到飛鳥集群活動的規律性啟發，進而利用群體智能建立的一個簡化模型。其核心思想是通過一群“粒子”（particles）的協同運動來搜索解空間。每個粒子代表問題的一個潛在解，具有位置和速度兩個屬性。粒子在搜索過程中根據自身的歷史最佳位置（個體經驗）和整個群體的最佳位置（群體經驗）調整自己的飛行方向和速度，最終收斂到全局最優解。
假設在一個 D 維搜索空間中，有 N 個粒子，每個粒子的狀態由位置向量x_i和v_i表示。在此需要引入兩個概念：個體最佳位置和全局最佳位置。
個體最佳位置（pbest）： 每個粒子記錄自身搜索過程中發現的最佳位置，即為pest_i，pest_i是根據目標函數（如適應度函數）計算得出的個體最優解。
全局最佳位置（gbest）： 整個粒子群記錄群體中所有粒子發現的最佳位置，記為gbest_i，gbest_i是群體中最優的解。
在這里插入圖片描述

如下圖所示，假設某粒子當前位置C，個體極值位置B，全局最優位置A，黃色向量為當前速度方向，綠色向量為向個體極值飛行步長，紅色為向全局最值飛行步長。那么該粒子下一步的運動狀態即為三者加權所得。
在這里插入圖片描述
后通過不斷迭代直至達到最大迭代次數或者全局最優位置不變時算法結束。

以下是一個用PSO優化目標函數f(x,y)=x²+y²的代碼實現，其目標是找到使函數值最小的 (x,y)：

import numpy as np# 目標函數
def objective_function(x):return np.sum(x**2)# PSO 參數
n_particles = 2
dim = 2
w, c1, c2 = 0.7, 1.5, 1.5
bounds = [-5, 5]
max_iter = 2# 初始化
positions = np.array([[2.0, 1.0], [-1.0, 3.0]])
velocities = np.array([[0.5, 0.3], [-0.2, 0.1]])
pbest_positions = positions.copy()
pbest_scores = np.array([objective_function(p) for p in pbest_positions])
gbest_idx = np.argmin(pbest_scores)
gbest_position = pbest_positions[gbest_idx].copy()
gbest_score = pbest_scores[gbest_idx]# 手動指定隨機數
rand_values = [[0.4, 0.6], [0.3, 0.7]]  # [r1, r2] for each iteration# PSO 迭代
for t in range(max_iter):print(f"\nIteration {t + 1}:")r1, r2 = rand_values[t]for i in range(n_particles):# 更新速度velocities[i] = (w * velocities[i] + c1 * r1 * (pbest_positions[i] - positions[i]) + c2 * r2 * (gbest_position - positions[i]))# 更新位置positions[i] += velocities[i]positions[i] = np.clip(positions[i], bounds[0], bounds[1])# 計算適應度score = objective_function(positions[i])# 更新個體最佳if score < pbest_scores[i]:pbest_scores[i] = scorepbest_positions[i] = positions[i].copy()# 更新全局最佳if score < gbest_score:gbest_score = scoregbest_position = positions[i].copy()print(f"Particle {i + 1}: Position={positions[i]}, Score={score}")print(f"\nFinal gbest: {gbest_position}, Score={gbest_score}")

其通過PSO算法的2次迭代獲得最優的超參數組合即gbest，輸出如下：

Iteration 1:
Particle 1: Position=[2.35 1.21], Score=6.986600000000001
Particle 2: Position=[1.56 1.27], Score=4.0465Iteration 2:
Particle 1: Position=[1.608  1.3255], Score=4.3426142500000005
Particle 2: Position=[3.352 0.059], Score=11.239384999999997Final gbest: [1.56 1.27], Score=4.0465

1.1.2 BI-LSTM

LSTM只能實現單向的傳遞，無法編碼從后到前的信息。當我們語句是承前啟后的情況時，自然能完成。但是當語句順序倒過來，關鍵次在后面了，LSTM就無能為力了。在更細粒度的分類時，如對于強程度的褒義、弱程度的褒義、中性、弱程度的貶義、強程度的貶義的五分類任務需要注意情感詞、程度詞、否定詞之間的交互。舉一個例子，“這個餐廳臟得不行，沒有隔壁好”，這里的“不行”是對“臟”的程度的一種修飾，通過BiLSTM可以更好的捕捉雙向的語義依賴。

雙向LSTM結構中有兩個 LSTM 層，一個從前向后處理序列，另一個從后向前處理序列。這樣，模型可以同時利用前面和后面的上下文信息。在處理序列時，每個時間步的輸入會被分別傳遞給兩個 LSTM 層，然后它們的輸出會被合并。通過雙向 LSTM，我們可以獲得更全面的序列信息，有助于提高模型在序列任務中的性能。
在這里插入圖片描述

雙向神經網絡的單元計算與單向的是相通的。但是雙向神經網絡隱藏層要保存兩個值，一個參與正向計算，另一個值參與反向計算，處理完成后將兩個LSTM的輸出拼接起來
在這里插入圖片描述
BI-LSTM的代碼實現如下：

class CustomBiLSTM(Layer):def __init__(self, units):super(CustomBiLSTM, self).__init__()self.units = units# 前向 LSTM 參數self.Wf_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)  # 輸入權重self.Uf_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)  # 循環權重self.bf_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wi_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ui_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bi_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wc_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uc_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bc_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wo_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uo_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bo_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)# 反向 LSTM 參數self.Wf_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uf_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bf_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wi_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ui_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bi_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wc_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uc_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bc_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wo_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uo_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bo_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)def call(self, inputs):# inputs: [batch_size, timesteps, features]batch_size = tf.shape(inputs)[0]timesteps = tf.shape(inputs)[1]# 初始化前向和反向狀態h_f = tf.zeros((batch_size, self.units))c_f = tf.zeros((batch_size, self.units))h_b = tf.zeros((batch_size, self.units))c_b = tf.zeros((batch_size, self.units))outputs_f, outputs_b = [], []# 前向 LSTMfor t in range(timesteps):x_t = inputs[:, t, :]  # [batch_size, features]ft = tf.sigmoid(tf.matmul(x_t, self.Wf_f) + tf.matmul(h_f, self.Uf_f) + self.bf_f)it = tf.sigmoid(tf.matmul(x_t, self.Wi_f) + tf.matmul(h_f, self.Ui_f) + self.bi_f)ct_tilde = tf.tanh(tf.matmul(x_t, self.Wc_f) + tf.matmul(h_f, self.Uc_f) + self.bc_f)ot = tf.sigmoid(tf.matmul(x_t, self.Wo_f) + tf.matmul(h_f, self.Uo_f) + self.bo_f)c_f = ft * c_f + it * ct_tildeh_f = ot * tf.tanh(c_f)outputs_f.append(h_f)# 反向 LSTMfor t in range(timesteps - 1, -1, -1):x_t = inputs[:, t, :]ft = tf.sigmoid(tf.matmul(x_t, self.Wf_b) + tf.matmul(h_b, self.Uf_b) + self.bf_b)it = tf.sigmoid(tf.matmul(x_t, self.Wi_b) + tf.matmul(h_b, self.Ui_b) + self.bi_b)ct_tilde = tf.tanh(tf.matmul(x_t, self.Wc_b) + tf.matmul(h_b, self.Uc_b) + self.bc_b)ot = tf.sigmoid(tf.matmul(x_t, self.Wo_b) + tf.matmul(h_b, self.Uo_b) + self.bo_b)c_b = ft * c_b + it * ct_tildeh_b = ot * tf.tanh(c_b)outputs_b.insert(0, h_b)# 拼接前向和反向輸出outputs = tf.stack(outputs_f + outputs_b, axis=1)  # [batch_size, timesteps*2, units]return outputs[:, :timesteps, :]  # 返回前向部分用于后續融合

1.1.3 BI-GRU

雖然LSTM能夠抑制梯度消失問題，但需要以增加時間復雜度和空間復雜度作為代價。GRU在LSTM基礎上將忘記門和輸入門合并成一個新的門即更新門， GRU包含兩個門：更新門與重置門。
重置門負責控制忽略前一時刻的狀態信息h_t-1的程度，重置門的值越小說明忽略的越多，更新門：定義了前面記憶保存到當前時間步的量，更新門的值越大說明上一時刻的狀態信息h_t-1帶入越多。這兩個門控向量決定了哪些信息最終能作為門控循環單元的輸出，它們能夠保存長期序列中的信息，使得重要信息可以跨越長時間步驟傳遞，且不會隨時間而清除或因為與預測不相關而移除。
GRU的內部結構圖和計算公式如下:
在這里插入圖片描述
Bi-GRU與Bi-LSTM的邏輯相同, 都是不改變其內部結構, 而是將模型應用兩次且方向不同, 再將兩次得到的結果進行拼接作為最終輸出.
BI-GRU的代碼實現如下：

class CustomBiGRU(Layer):def __init__(self, units):super(CustomBiGRU, self).__init__()self.units = units# 前向 GRU 參數self.Wz_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uz_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bz_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wr_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ur_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.br_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wh_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uh_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bh_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)# 反向 GRU 參數self.Wz_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uz_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bz_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wr_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ur_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.br_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wh_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uh_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bh_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)def call(self, inputs):batch_size = tf.shape(inputs)[0]timesteps = tf.shape(inputs)[1]h_f = tf.zeros((batch_size, self.units))h_b = tf.zeros((batch_size, self.units))outputs_f, outputs_b = [], []# 前向 GRUfor t in range(timesteps):x_t = inputs[:, t, :]zt = tf.sigmoid(tf.matmul(x_t, self.Wz_f) + tf.matmul(h_f, self.Uz_f) + self.bz_f)rt = tf.sigmoid(tf.matmul(x_t, self.Wr_f) + tf.matmul(h_f, self.Ur_f) + self.br_f)ht_tilde = tf.tanh(tf.matmul(x_t, self.Wh_f) + tf.matmul(rt * h_f, self.Uh_f) + self.bh_f)h_f = (1 - zt) * h_f + zt * ht_tildeoutputs_f.append(h_f)# 反向 GRUfor t in range(timesteps - 1, -1, -1):x_t = inputs[:, t, :]zt = tf.sigmoid(tf.matmul(x_t, self.Wz_b) + tf.matmul(h_b, self.Uz_b) + self.bz_b)rt = tf.sigmoid(tf.matmul(x_t, self.Wr_b) + tf.matmul(h_b, self.Ur_b) + self.br_b)ht_tilde = tf.tanh(tf.matmul(x_t, self.Wh_b) + tf.matmul(rt * h_b, self.Uh_b) + self.bh_b)h_b = (1 - zt) * h_b + zt * ht_tildeoutputs_b.insert(0, h_b)# 拼接前向和反向輸出outputs = tf.stack(outputs_f + outputs_b, axis=1)return outputs[:, :timesteps, :]  # 返回前向部分用于融合

1.2 整體框架

論文提出的模型將粒子群優化（PSO）與雙向長短期記憶網絡（Bi-LSTM）和雙向門控循環單元（Bi-GRU）相結合，并通過自注意力層進一步優化，其模型架構如下圖所示：
在這里插入圖片描述
其大致工作原理就是先通過PSO算法優化輸入，然后將優化后的數據同步輸入到Bi-LSTM 和 Bi-GRU 中進行并行處理，此后將Bi-LSTM和Bi-GRU的輸出拼接，后將得到的拼接輸出放入自注意力層進行加權，最后將特征融合后得到最終的預測結果。
主體模型的實現代碼如下，其主要是調用BI-LSTM，注意力機制等部分進行拼接：

def build_model(timesteps, features, units_lstm, units_gru):inputs = Input(shape=(timesteps, features))# 自定義 Bi-LSTMbilstm_out = CustomBiLSTM(units_lstm)(inputs)# 自定義 Bi-GRUbigru_out = CustomBiGRU(units_gru)(inputs)# 拼接輸出concat_out = tf.keras.layers.Concatenate(axis=-1)([bilstm_out, bigru_out])# 自注意力層attention_out = SelfAttention(units=units_lstm + units_gru)(concat_out)# 全連接層dense_out = Dense(32, activation='sigmoid')(attention_out)outputs = Dense(1)(dense_out)model = Model(inputs, outputs)model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')return model

1.3 實驗分析

（1）數據集
研究區域位于土耳其 Kizilirmak 流域，位于中安納托利亞東部，連接黑海，數據來源于當地觀測站。使用 2002-2011 年 3652 天的每日流量數據，80% 用于訓練，20% 用于測試。
（2）評估標準
均方根誤差（RMSE）: 預測值與實際值偏差的平方根，值越低越好。
平均絕對誤差（MAE）: 預測值與實際值的平均絕對差，值越低越好。
決定系數（R2）: 模型解釋數據變異的比例，值越接近 1 越好。
Kling-Gupta 效率（KGE）: 綜合相關性、偏差和變異性的指標，值越高越好。
Brier 分數（BF）: 預測概率的準確性，越低越好。
Nash-Sutcliffe 效率（NSE）: 衡量預測與均值的相對優劣，值越高越好。
（3）實驗結果
下圖總結了PSO算法的結果，概述了每個模型架構的選定超參數值。
在這里插入圖片描述
從實驗結果來看，使用PSO進行優化的雙向方法優于傳統的單向方法(如GRU和LSTM)在所有數據集中，雙向模型在各種性能指標(包括RMSE、MAE和R)上始終優于單向模刑。
下圖是使用 NSE分析進行預測值與觀察值的比較：
在這里插入圖片描述

從上述實驗結果可以發現，添加注意力層的模型在所有數據集上 RMSE 和 MAE 降低，R²和 NSE 提升,箱線圖顯示提議方法的 IQR 較窄，預測更穩定。由此可見注意力層在增強預測模型有效捕獲數據中相關特征和依賴關系的能力方面的重要性，從而提高了整體性能。