設 Θ = { π k , θ k } k = 1 K \varTheta= \{ \pi_k, \boldsymbol {\theta}_k \}_{k=1}^{K} Θ={πk?,θk?}k=1K?為參數向量, X = { x 1 , ? , x n } \mathcal {X} = \{ {\bm x}_1, \cdots, {\bm x}_n \} X={x1?,?,xn?}為觀測數據,給定數據點的獨立性,似然函數可以寫成:
L ( Θ ) = p ( X ∣ Θ ) = p ( X ∣ { π k , θ k } i = 1 K ) = ∏ i = 1 n p ( x i ∣ { π k , θ k } i = 1 K ) = ∏ i = 1 n ( ∑ k = 1 K π k p ( x i ∣ θ k ) ) (10) L(\varTheta) = p(\mathcal {X} \mid {\varTheta})= p(\mathcal {X} | \{ \pi_k, {\bm \theta}_k \}_{i=1}^{K}) \\= \prod_{i=1}^{n} p(\boldsymbol{x}_i | \{ \pi_k, {\bm \theta}_k \}_{i=1}^{K}) = \prod_{i=1}^{n} \left( \sum_{k=1}^{K} \pi_k p(\boldsymbol{x}_i | {\bm \theta}_k) \right) \tag{10} L(Θ)=p(X∣Θ)=p(X∣{πk?,θk?}i=1K?)=i=1∏n?p(xi?∣{πk?,θk?}i=1K?)=i=1∏n?(k=1∑K?πk?p(xi?∣θk?))(10)
因此,對數似然函數為:
L ( Θ ; X ) = ln ? p ( X ∣ Θ ) = ln ? p ( X ∣ { π k , θ k } i = 1 K ) = ln ? ∏ i = 1 n p ( x i ∣ { π k , θ k } i = 1 K ) = ∑ i = 1 n ln ? ( ∑ k = 1 K π k p ( x i ∣ θ k ) ) (11) L(\varTheta;\mathcal {X}) = \ln p(\mathcal {X} \mid {\varTheta}) = \ln p(\mathcal {X} | \{ \pi_k, {\bm \theta}_k \}_{i=1}^{K}) \\=\ln \prod_{i=1}^{n} p(\boldsymbol{x}_i \mid \{ \pi_k, {\bm \theta}_k \}_{i=1}^{K}) =\sum\limits_{i=1}^{n} \ln \left( \sum\limits_{k=1}^{K} \pi_k p( \boldsymbol{x}_i \mid \boldsymbol {\theta}_k) \right) \tag{11} L(Θ;X)=lnp(X∣Θ)=lnp(X∣{πk?,θk?}i=1K?)=lni=1∏n?p(xi?∣{πk?,θk?}i=1K?)=i=1∑n?ln(k=1∑K?πk?p(xi?∣θk?))(11)
求梯度
? θ k L = ∑ i = 1 n 1 p ( x i ∣ Θ ) ? θ k [ ∑ k = 1 K π k p ( x i ∣ θ k ) ] \nabla_{\bm{\theta}_k} L = \sum_{i=1}^{n} \frac{1}{p(\bm{x}_i|\boldsymbol{\varTheta})} \nabla_{\bm{\theta}_k} \left[ \sum_{k=1}^{K} \pi_kp(\bm{x}_i|\boldsymbol{\theta}_k) \right] ?θk??L=i=1∑n?p(xi?∣Θ)1??θk??[k=1∑K?πk?p(xi?∣θk?)]
式中
p ( x i ∣ Θ ) = ∑ k = 1 K π k p ( x i ∣ θ k ) , (12) p(\boldsymbol{x}_i \mid \varTheta) = \sum_{k=1}^{K} \pi_k p(\boldsymbol{x}_i \mid \boldsymbol{\theta}_k), \tag{12} p(xi?∣Θ)=k=1∑K?πk?p(xi?∣θk?),(12)
最大似然參數估計由下式決定:
{ π ^ k , θ ^ k } i = 1 K = arg ? max ? { π k , θ k } i = 1 K ∑ i = 1 n ln ? ( ∑ k = 1 K π k p ( x i ∣ θ k ) ) (13) \{ \hat{\pi}_k, \hat{\bm \theta}_k \}_{i=1}^{K} = \arg \max_{\{ \pi_k, {\bm \theta}_k \}_{i=1}^{K}} \sum_{i=1}^{n} \ln \left( \sum_{k=1}^{K} \pi_k p(\boldsymbol{x}_i | {\bm \theta}_k) \right) \tag{13} {π^k?,θ^k?}i=1K?=arg{πk?,θk?}i=1K?max?i=1∑n?ln(k=1∑K?πk?p(xi?∣θk?))(13)
在單個高斯函數 ( K = 1 K=1 K=1) 的情況下,這種最大化可以以解析形式實現,從而得到常用的樣本均值和樣本協方差矩陣估計量( π 1 = 1 \pi_1 = 1 π1?=1 且沒有混合系數可估計)。然而,對于 K ? 2 K \geqslant 2 K?2,最大參數的解析表達式是未知的,并且最大化必須以數值形式進行。
這是因為式 (11) 中對數內存在求和,而非乘積,無法直接對(高斯)密度求對數,這使得 L ( Θ ; X ) L(\varTheta;\mathcal {X}) L(Θ;X)的最大化變得復雜,難以求解。
在下一節中,將介紹一個著名的數值方法——期望-最大化算法來解決最大似然問題。