15. 線性動態系統(卡曼濾波,Kalman Filter)
15.1. 概述
15.1.1. 背景介紹
??變量隨時間變化的系統叫做動態系統,其中隱變量取值離散的是隱馬爾可夫模型(HMM),而隱變量取值連續的分為線性動態系統和粒子濾波(Particular Filter),前者各變量和變量各時刻滿足線性關系和高斯分布,后者則不滿足。
??線性關系:時刻之間:zt=A?zt?1+B+εz_t=A\cdot z_{t-1}+B+\varepsilonzt?=A?zt?1?+B+ε;變量之間:xt=C?zt+D+δx_t=C\cdot z_{t}+D+\deltaxt?=C?zt?+D+δ;其中 ε,δ\varepsilon,\deltaε,δ 是噪聲,在線性動態系統中服從高斯分布。
15.1.2. 模型定義
??設噪聲 ε~N(0,Q)\varepsilon \sim \mathcal{N}(0,Q)ε~N(0,Q),δ~N(0,R)\delta \sim \mathcal{N}(0,R)δ~N(0,R),同時設初始狀態 z1~N(μ1,Σ1)z_1\sim \mathcal{N}(\mu_1,\Sigma_1)z1?~N(μ1?,Σ1?),則可以通過高斯分布依次計算 zt∣zt?1~N(A?zt?1+B,Q)z_t|z_{t-1}\sim \mathcal{N}(A\cdot z_{t-1}+B,Q)zt?∣zt?1?~N(A?zt?1?+B,Q),xt∣zt~N(C?zt+D,R)x_t|z_{t}\sim \mathcal{N}(C\cdot z_{t}+D,R)xt?∣zt?~N(C?zt?+D,R) 得到所有變量的分布,因此模型參數為 θ=(A,B,C,D,Q,R,μ1,Σ1)\theta =(A,B,C,D,Q,R,\mu_1,\Sigma_1)θ=(A,B,C,D,Q,R,μ1?,Σ1?).
??線性動態系統和HMM一樣,也服從一階齊次馬爾可夫假設和觀測獨立假設(見上篇筆記機器學習 [白板推導](十一))。
15.2. Filtering
??首先,filtering公式為 P(zt∣x1:t)P(z_t|x_{1:t})P(zt?∣x1:t?),是一種Online算法,通過 t 時刻及之前的觀測變量來推測當前狀態,這既是模型求解(Learning)的必要信息,同時也是對prediction結果 P(zt∣x1:t?1)P(z_t|x_{1:t-1})P(zt?∣x1:t?1?) 的修正(Correction)。
??filtering也可以遞推求解,推導為:
P(zt∣x1:t)=P(zt,x1:t)P(x1:t)=P(xt∣zt,x1:t?1)?P(xt∣zt)?P(zt,x1:t?1)P(x1:t)=P(xt∣zt)?P(zt∣x1:t?1)?P(x1:t?1)P(x1:t)=P(xt∣zt)?Emission?Probability?P(zt∣x1:t?1)?Prediction?of?t-1?P(x1:t?1)P(x1:t)?Constant,(15.1)\begin{aligned} P(z_t|x_{1:t})&=\frac{P(z_t,x_{1:t})}{P(x_{1:t})}=\frac{\overset{P(x_t|z_t)}{\overbrace{P(x_t|z_t,x_{1:t-1})}}\cdot P(z_t,x_{1:t-1})}{P(x_{1:t})}\\ &=\frac{P(x_t|z_t)\cdot P(z_t|x_{1:t-1})\cdot P(x_{1:t-1})}{P(x_{1:t})}\\ &=\underset{\text{Emission Probability}}{\underbrace{P(x_t|z_t)}}\cdot \underset{\text{Prediction of t-1}}{\underbrace{P(z_t|x_{1:t-1})}}\cdot \underset{\text{Constant}}{\underbrace{\frac{P(x_{1:t-1})}{P(x_{1:t})}}},\tag{15.1} \end{aligned} P(zt?∣x1:t?)?=P(x1:t?)P(zt?,x1:t?)?=P(x1:t?)P(xt?∣zt?,x1:t?1?)?P(xt?∣zt?)??P(zt?,x1:t?1?)?=P(x1:t?)P(xt?∣zt?)?P(zt?∣x1:t?1?)?P(x1:t?1?)?=Emission?ProbabilityP(xt?∣zt?)???Prediction?of?t-1P(zt?∣x1:t?1?)???ConstantP(x1:t?)P(x1:t?1?)???,?(15.1)
其中Prediction可以求解為:
P(zt∣x1:t?1)=∫zt?1P(zt,zt?1∣x1:t?1)dzt?1=∫zt?1P(zt∣zt?1,x1:t?1)?P(zt∣zt?1)?State?Transition?Probability?P(zt?1∣x1:t?1)?Filtering?of?t-1dzt?1,(15.2)\begin{aligned} P(z_t|x_{1:t-1})&=\int_{z_{t-1}}P(z_t,z_{t-1}|x_{1:t-1})dz_{t-1}\\ &=\int_{z_{t-1}}\underset{\underset{\text{State Transition Probability}}{\underbrace{P(z_t|z_{t-1})}}}{\underbrace{P(z_t|z_{t-1},x_{1:t-1})}}\cdot \underset{\text{Filtering of t-1}}{\underbrace{P(z_{t-1}|x_{1:t-1})}}dz_{t-1},\tag{15.2} \end{aligned} P(zt?∣x1:t?1?)?=∫zt?1??P(zt?,zt?1?∣x1:t?1?)dzt?1?=∫zt?1??State?Transition?ProbabilityP(zt?∣zt?1?)??P(zt?∣zt?1?,x1:t?1?)???Filtering?of?t-1P(zt?1?∣x1:t?1?)??dzt?1?,?(15.2)
因此可以通過 t?1t-1t?1 時刻的filtering計算 ttt 時刻的prediction,進而計算t時刻的filtering,循環迭代進行求解。
??總結來說,Filtering的過程可以寫作以下步驟:
- t=1t=1t=1,P(z1∣x1)=P(z1,x1)P(x1)=P(x1∣z1)?P(z1)∫z1P(x1∣z1)?P(z1)dz1P(z_1|x_1)=\frac{P(z_1,x_1)}{P(x_1)}=\frac{P(x_1|z_1)\cdot P(z_1)}{\int_{z_1}P(x_1|z_1)\cdot P(z_1)dz_1}P(z1?∣x1?)=P(x1?)P(z1?,x1?)?=∫z1??P(x1?∣z1?)?P(z1?)dz1?P(x1?∣z1?)?P(z1?)?,并用上面的公式計算Prediction P(z2∣x1)P(z_2|x_1)P(z2?∣x1?).
- t=2t=2t=2,根據上面的遞推關系計算 P(z2∣x1,x2)P(z_2|x_1,x_2)P(z2?∣x1?,x2?)(作為filtering的更新,update,以及prediction的修正,correction)和 P(z3∣x1,x2)P(z_3|x_1,x_2)P(z3?∣x1?,x2?)(新的prediction)。
- ……
- t=Tt=Tt=T,計算 P(zT∣x1:T?1)P(z_T|x_{1:T-1})P(zT?∣x1:T?1?).
??在整個計算過程中,Filtering和Prediction的計算結果均為高斯分布,其參數(均值和標準差)可以通過之前時刻的計算結果以及模型參數算出,推導過程參考機器學習 [白板推導](一)中高斯分布的性質,這里不再詳細介紹。
16. 粒子濾波(Particular Filtering)
16.1. 背景介紹
??在15. 線性動態系統(卡曼濾波)中,可以通過filtering和prediction交替迭代的方法求得隱狀態序列,但應對非線性非高斯模型,無法求出解析解,只能通過采樣模擬的方法求出近似解,如此便誕生了粒子濾波。
??粒子濾波應用范圍非常之廣,且已經產生了許多研究成果和變體,這里只介紹最基本的粒子濾波算法。
16.2. 算法介紹
16.2.1. 重要度采樣及針對filtering的改進 —— 序列重要度采樣(Sequential Importance Sampling,SIS)
??首先回顧重要度采樣,給定一個復雜的分布 p(x)p(x)p(x),需要計算期望 E[f(x)]E[f(x)]E[f(x)] 時,直接從 p(x)p(x)p(x) 中采樣較為困難,此時可以設置一個簡單的建議分布 q(x)q(x)q(x),此時 E[f(x)]=∫z[f(x)p(x)]dz=∫z[f(x)?p(x)q(x)?q(x)]dzE[f(x)]=\int_z \left[f(x)p(x)\right]dz=\int_z \left[f(x)\cdot\frac{p(x)}{q(x)}\cdot q(x)\right]dzE[f(x)]=∫z?[f(x)p(x)]dz=∫z?[f(x)?q(x)p(x)??q(x)]dz,采樣變得容易。
??將重要度采樣應用于動態系統的filtering問題時,對每個時刻 ttt 和每個樣本 xt(i)x_t^{(i)}xt(i)?,都需要計算重要度 wt(i)=p(zt(i)∣x1:t(i))q(zt(i)∣x1:t(i))w_t^{(i)}=\frac{p(z_t^{(i)}|x_{1:t}^{(i)})}{q(z_t^{(i)}|x^{(i)}_{1:t})}wt(i)?=q(zt(i)?∣x1:t(i)?)p(zt(i)?∣x1:t(i)?)?,而 p(zt(i)∣x1:t(i))p(z_t^{(i)}|x_{1:t}^{(i)})p(zt(i)?∣x1:t(i)?) 的計算是復雜的,還要計算N次(N個樣本)。
??將filtering問題的計算目標進行調整,由 p(zt(i)∣x1:t(i))p(z_t^{(i)}|x_{1:t}^{(i)})p(zt(i)?∣x1:t(i)?) 改為 p(z1:t(i)∣x1:t(i))p(z_{1:t}^{(i)}|x_{1:t}^{(i)})p(z1:t(i)?∣x1:t(i)?)(同樣可以達到目標),此時重要度為 wt(i)∝p(z1:t(i)∣x1:t(i))q(z1:t(i)∣x1:t(i))w_t^{(i)}\propto \frac{p(z_{1:t}^{(i)}|x_{1:t}^{(i)})}{q(z_{1:t}^{(i)}|x^{(i)}_{1:t})}wt(i)?∝q(z1:t(i)?∣x1:t(i)?)p(z1:t(i)?∣x1:t(i)?)?,分別推導分子和分母的遞推式,如果可以建立遞推關系,則可以簡化運算:
- 對于分子,可以推導如下
p(z1:t(i)∣x1:t(i))=p(z1:t(i),x1:t(i))p(x1:t(i))=p(xt(i)∣z1:t(i),x1:t?1(i))?p(xt(i)∣zt(i))?Emission?Probability?p(z1:t(i),x1:t?1(i))p(x1:t(i))=p(xt(i)∣zt(i))?p(zt(i)∣z1:t?1(i),x1:t?1(i))?p(zt(i)∣zt?1(i))?State?Transition?Probability?p(z1:t?1(i),x1:t?1(i))p(x1:t(i))=p(xt(i)∣zt(i))?p(zt(i)∣zt?1(i))?p(z1:t?1(i)∣x1:t?1(i))?p(x1:t?1(i))p(x1:t(i))=[p(x1:t?1(i))p(x1:t(i))?p(xt(i)∣zt(i))?p(zt(i)∣zt?1(i))]?p(z1:t?1(i)∣x1:t?1(i)),(15.3)\begin{aligned} p(z_{1:t}^{(i)}|x_{1:t}^{(i)})&=\frac{p(z_{1:t}^{(i)},x_{1:t}^{(i)})}{p(x_{1:t}^{(i)})}\\ &=\frac{\overset{\overset{\text{Emission Probability}}{\overbrace{p(x_t^{(i)}|z_t^{(i)})}}}{\overbrace{p(x_t^{(i)}|z_{1:t}^{(i)},x_{1:t-1}^{(i)})}}\cdot p(z_{1:t}^{(i)},x_{1:t-1}^{(i)})}{p(x_{1:t}^{(i)})}\\ &=\frac{p(x_t^{(i)}|z_t^{(i)})\cdot \overset{\overset{\text{State Transition Probability}}{\overbrace{p(z_t^{(i)}|z_{t-1}^{(i)})}}}{\overbrace{p(z_t^{(i)}|z_{1:t-1}^{(i)},x_{1:t-1}^{(i)})}}\cdot p(z_{1:t-1}^{(i)},x_{1:t-1}^{(i)})}{p(x_{1:t}^{(i)})}\\ &=\frac{p(x_t^{(i)}|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)})\cdot p(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)})\cdot p(x_{1:t-1}^{(i)})}{p(x_{1:t}^{(i)})}\\ &=\left [\frac{p(x_{1:t-1}^{(i)})}{p(x_{1:t}^{(i)})}\cdot p(x_t^{(i)}|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)}) \right ]\cdot p(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)}), \tag{15.3} \end{aligned} p(z1:t(i)?∣x1:t(i)?)?=p(x1:t(i)?)p(z1:t(i)?,x1:t(i)?)?=p(x1:t(i)?)p(xt(i)?∣z1:t(i)?,x1:t?1(i)?)?p(xt(i)?∣zt(i)?)?Emission?Probability???p(z1:t(i)?,x1:t?1(i)?)?=p(x1:t(i)?)p(xt(i)?∣zt(i)?)?p(zt(i)?∣z1:t?1(i)?,x1:t?1(i)?)?p(zt(i)?∣zt?1(i)?)?State?Transition?Probability???p(z1:t?1(i)?,x1:t?1(i)?)?=p(x1:t(i)?)p(xt(i)?∣zt(i)?)?p(zt(i)?∣zt?1(i)?)?p(z1:t?1(i)?∣x1:t?1(i)?)?p(x1:t?1(i)?)?=[p(x1:t(i)?)p(x1:t?1(i)?)??p(xt(i)?∣zt(i)?)?p(zt(i)?∣zt?1(i)?)]?p(z1:t?1(i)?∣x1:t?1(i)?),?(15.3) - 對于分母,可以設 q(z1:t(i)∣x1:t(i))q(z_{1:t}^{(i)}|x_{1:t}^{(i)})q(z1:t(i)?∣x1:t(i)?) 滿足 q(z1:t(i)∣x1:t(i))=q(zt(i)∣z1:t?1(i),x1:t(i))?q(z1:t?1(i)∣x1:t?1(i))q(z_{1:t}^{(i)}|x_{1:t}^{(i)})=q(z_{t}^{(i)}|z_{1:t-1}^{(i)},x_{1:t}^{(i)})\cdot q(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)})q(z1:t(i)?∣x1:t(i)?)=q(zt(i)?∣z1:t?1(i)?,x1:t(i)?)?q(z1:t?1(i)?∣x1:t?1(i)?).
- 將分子分母帶入得,
wt(i)∝p(z1:t(i)∣x1:t(i))q(z1:t(i)∣x1:t(i))=[p(x1:t?1(i))p(x1:t(i))?p(xt(i)∣zt(i))?p(zt(i)∣zt?1(i))]?p(z1:t?1(i)∣x1:t?1(i))q(zt(i)∣z1:t?1(i),x1:t(i))?q(z1:t?1(i)∣x1:t?1(i))=p(x1:t?1(i))p(x1:t(i))?Constant?[p(xt(i)∣zt(i))?p(zt(i)∣zt?1(i))q(zt(i)∣z1:t?1(i),x1:t(i))?q(zt(i)∣zt?1(i),x1:t(i))]?p(z1:t?1(i)∣x1:t?1(i))q(z1:t?1(i)∣x1:t?1(i))?wt?1(i)∝[p(xt(i)∣zt(i))?p(zt(i)∣zt?1(i))q(zt(i)∣zt?1(i),x1:t(i))]?wt?1(i),(15.4)\begin{aligned} w_t^{(i)}&\propto \frac{p(z_{1:t}^{(i)}|x_{1:t}^{(i)})}{q(z_{1:t}^{(i)}|x^{(i)}_{1:t})} \\ &=\frac{\left [\frac{p(x_{1:t-1}^{(i)})}{p(x_{1:t}^{(i)})}\cdot p(x_t^{(i)}|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)}) \right ]\cdot p(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)})}{q(z_{t}^{(i)}|z_{1:t-1}^{(i)},x_{1:t}^{(i)})\cdot q(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)})}\\ &=\underset{\text{Constant}}{\underbrace{\frac{p(x_{1:t-1}^{(i)})}{p(x_{1:t}^{(i)})}}}\cdot \left [\frac{p(x_t^{(i)}|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)}) }{\underset{q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})}{\underbrace{q(z_{t}^{(i)}|z_{1:t-1}^{(i)},x_{1:t}^{(i)})}}}\right ]\cdot \underset{w_{t-1}^{(i)}}{\underbrace{\frac{p(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)})}{q(z_{1:t-1}^{(i)}|x_{1:t-1}^{(i)})}}} \\ &\propto \left [\frac{p(x_t^{(i)}|z_t^{(i)})\cdot p(z_t^{(i)}|z_{t-1}^{(i)}) }{q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})}\right ]\cdot w_{t-1}^{(i)} ,\tag{15.4} \end{aligned} wt(i)??∝q(z1:t(i)?∣x1:t(i)?)p(z1:t(i)?∣x1:t(i)?)?=q(zt(i)?∣z1:t?1(i)?,x1:t(i)?)?q(z1:t?1(i)?∣x1:t?1(i)?)[p(x1:t(i)?)p(x1:t?1(i)?)??p(xt(i)?∣zt(i)?)?p(zt(i)?∣zt?1(i)?)]?p(z1:t?1(i)?∣x1:t?1(i)?)?=Constantp(x1:t(i)?)p(x1:t?1(i)?)?????q(zt(i)?∣zt?1(i)?,x1:t(i)?)q(zt(i)?∣z1:t?1(i)?,x1:t(i)?)??p(xt(i)?∣zt(i)?)?p(zt(i)?∣zt?1(i)?)???wt?1(i)?q(z1:t?1(i)?∣x1:t?1(i)?)p(z1:t?1(i)?∣x1:t?1(i)?)???∝[q(zt(i)?∣zt?1(i)?,x1:t(i)?)p(xt(i)?∣zt(i)?)?p(zt(i)?∣zt?1(i)?)?]?wt?1(i)?,?(15.4)
如此便建立了遞推計算關系,可以通過蒙特卡洛的重要度采樣方法計算filtering問題。
??為了采樣能收斂于真正的期望,需要對重要度進行歸一化,即保證 ∑i=1Nwt(i)=1\sum_{i=1}^N w_t^{(i)}=1∑i=1N?wt(i)?=1,這也就是為什么遞推關系中可以化簡常數,只要得到正比的關系即可。
??上述算法即為基礎的粒子濾波算法,其中對狀態值的每一次抽樣結果 zt(i)~q(zt(i)∣zt?1(i),x1:t(i))z_t^{(i)}\sim q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})zt(i)?~q(zt(i)?∣zt?1(i)?,x1:t(i)?) 被稱為一個粒子。
16.2.2. 重采樣
??在應用上述粒子濾波算法的過程中,很多時候存在粒子退化問題,即只有少數狀態擁有真正有效的重要度 wt(i)w_t^{(i)}wt(i)?,其他大多數狀態的重要度接近0,但算法每次都需要對所有 zt(i)z_t^{(i)}zt(i)? 以 q(zt(i)∣zt?1(i),x1:t(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?) 的概率抽樣,而 q(zt(i)∣zt?1(i),x1:t(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?) 與重要度 wt(i)w_t^{(i)}wt(i)? 之間沒有關聯,這意味著可能有很多情況 q(zt(i)∣zt?1(i),x1:t(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?) 較大,wt(i)w_t^{(i)}wt(i)? 較小,抽樣結果充斥著不重要的信息,或是反過來,有效信息在抽樣結果中占比很低,這些情況都會使計算資源浪費,模型難以收斂,效果變差等等。
??重采樣的思想用于解決上述問題,具體來說,當使用概率 q(zt(i)∣zt?1(i),x1:t(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?) 完成抽樣后,計算每個粒子的重要度,并對重要度計算CDF,再在0-1的均勻分布上抽取N個分裂因子 μt(i)\mu_t^{(i)}μt(i)?,每個 μt(i)\mu_t^{(i)}μt(i)? 落在CDF的哪個粒子區間內,就把哪個粒子復制一次(有時也叫粒子分裂)。
??例如,抽樣得到的結果為z1,z2,z3(從小到大排列),三者的重要度分別為0.1,0.1,0.8,計算CDF為0-0.1、0.1-0.2、0.8-1.0三段,則需要抽取三個分裂因子,假設為0.15,0.38,0.54,第一個落入z2區間(0.1-0.2),則z2復制一次,另外兩個落入z3區間(0.2-0.8),則z3復制兩次,通過這個方法盡可能讓每次抽樣的粒子擁有接近1/N的重要度(或者說重要度高的粒子分裂的次數也多,更加能被有效利用,而重要度低的粒子可以適當舍棄)。
??當樣本的維度過高時粒子退化問題將更加可怕,此時重采樣的必要性更高;
16.2.3. Sampling-Importance-Resampling(SIR)
??除了重采樣外,還有一種方法可以解決上文中提到的粒子退化問題,就是設置一個更加合理的 q(zt(i)∣zt?1(i),x1:t(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?),有一個方法是令 q(zt(i)∣zt?1(i),x1:t(i))=p(zt(i)∣zt?1(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})=p(z_t^{(i)}|z_{t-1}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?)=p(zt(i)?∣zt?1(i)?),此時重要度的遞推關系為 wt(i)=p(xt(i)∣zt(i))?wt?1(i)w_t^{(i)}=p(x_t^{(i)}|z_t^{(i)})\cdot w_{t-1}^{(i)}wt(i)?=p(xt(i)?∣zt(i)?)?wt?1(i)?,這種改進版的粒子濾波再加上重采樣,可以進一步提升算法性能,叫做SIR.
??為什么要令 q(zt(i)∣zt?1(i),x1:t(i))=p(zt(i)∣zt?1(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})=p(z_t^{(i)}|z_{t-1}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?)=p(zt(i)?∣zt?1(i)?),其實是邏輯上的考慮,filtering的目標為計算 p(zt(i)∣x1:t(i))p(z_t^{(i)}|x_{1:t}^{(i)})p(zt(i)?∣x1:t(i)?)(或 p(z1:t(i)∣x1:t(i))p(z_{1:t}^{(i)}|x_{1:t}^{(i)})p(z1:t(i)?∣x1:t(i)?)),而這個問題中粒子濾波的方案是要對 zt(i)z_t^{(i)}zt(i)? 進行采樣來計算均值,一個很簡單的思想是,在已知 xt(i)x_t^{(i)}xt(i)? 的前提下,p(xt(i)∣zt(i))p(x_t^{(i)}|z_t^{(i)})p(xt(i)?∣zt(i)?) 越大的 zt(i)z_t^{(i)}zt(i)? 很多時候越有可能成為真正的狀態,因此應該賦予其越高的權重,這就是要設置一個特殊的 q(zt(i)∣zt?1(i),x1:t(i))=p(zt(i)∣zt?1(i))q(z_{t}^{(i)}|z_{t-1}^{(i)},x_{1:t}^{(i)})=p(z_t^{(i)}|z_{t-1}^{(i)})q(zt(i)?∣zt?1(i)?,x1:t(i)?)=p(zt(i)?∣zt?1(i)?) 的原因,就是為了湊出 wt(i)=p(xt(i)∣zt(i))?wt?1(i)w_t^{(i)}=p(x_t^{(i)}|z_t^{(i)})\cdot w_{t-1}^{(i)}wt(i)?=p(xt(i)?∣zt(i)?)?wt?1(i)?,然而這是一個樸素的思想,后人對這個建議分布的取法進行了很多研究,這里只介紹最基本的粒子濾波,因此不做贅述。