Sigmoid函數導數推導詳解
- 在邏輯回歸中,Sigmoid函數的導數推導是一個關鍵步驟,它使得梯度下降算法能夠高效地計算。
1. Sigmoid函數定義
首先回顧Sigmoid函數的定義:
g ( z ) = 1 1 + e ? z g(z) = \frac{1}{1 + e^{-z}} g(z)=1+e?z1?
2. 導數推導過程
-
從Sigmoid函數出發:
g ( z ) = 1 1 + e ? z g(z) = \frac{1}{1 + e^{-z}} g(z)=1+e?z1? -
令 u = 1 + e ? z u = 1 + e^{-z} u=1+e?z,則 g ( z ) = u ? 1 g(z) = u^{-1} g(z)=u?1
-
使用鏈式法則:
d g d z = d g d u ? d u d z = ? u ? 2 ? ( ? e ? z ) = e ? z ( 1 + e ? z ) 2 \frac{dg}{dz} = \frac{dg}{du} \cdot \frac{du}{dz} = -u^{-2} \cdot (-e^{-z}) = \frac{e^{-z}}{(1 + e^{-z})^2} dzdg?=dudg??dzdu?=?u?2?(?e?z)=(1+e?z)2e?z? -
現在,我們將其表示為 g ( z ) g(z) g(z)的函數:
e ? z 1 + e ? z = 1 ? 1 1 + e ? z = 1 ? g ( z ) \frac{e^{-z}}{1 + e^{-z}} = 1 - \frac{1}{1 + e^{-z}} = 1 - g(z) 1+e?ze?z?=1?1+e?z1?=1?g(z) -
因此:
g ′ ( z ) = 1 1 + e ? z ? e ? z 1 + e ? z = g ( z ) ? ( 1 ? g ( z ) ) g'(z) = \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} = g(z) \cdot (1 - g(z)) g′(z)=1+e?z1??1+e?ze?z?=g(z)?(1?g(z))
3. 代碼實現
import numpy as np
import matplotlib.pyplot as pltdef sigmoid(z):return 1 / (1 + np.exp(-z))def sigmoid_derivative(z):return sigmoid(z) * (1 - sigmoid(z))z = np.linspace(-10, 10, 100)
plt.figure(figsize=(10, 6))
plt.plot(z, sigmoid(z), label="Sigmoid function")
plt.plot(z, sigmoid_derivative(z), label="Sigmoid derivative")
plt.xlabel("z")
plt.ylabel("g(z)")
plt.title("Sigmoid Function and its Derivative")
plt.legend()
plt.grid(True)
plt.show()
4. 導數性質分析
- 最大值:當 g ( z ) = 0.5 g(z) = 0.5 g(z)=0.5時,導數達到最大值 0.25 0.25 0.25
- 對稱性:導數在 z = 0 z=0 z=0時最大,隨著 ∣ z ∣ |z| ∣z∣增大而迅速減小
- 非負性:導數始終非負,因為 0 < g ( z ) < 1 0 < g(z) < 1 0<g(z)<1
5. 導數形式的重要型
- 在邏輯回歸的梯度下降中,需要計算損失函數對參數的導數。由于損失函數中包含Sigmoid函數,這個導數形式使得計算變得非常簡潔:
? ? θ j J ( θ ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) ? y ( i ) ) x j ( i ) \frac{\partial}{\partial \theta_j}J(\theta) = \frac{1}{m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} ?θj???J(θ)=m1?i=1∑m?(hθ?(x(i))?y(i))xj(i)?
- 其中 h θ ( x ) = g ( θ T x ) h_\theta(x) = g(\theta^T x) hθ?(x)=g(θTx)。如果沒有這個簡潔的導數形式,梯度計算會復雜得多。
- 推導損失函數對 θ j \theta_j θj?的偏導數:
? ? θ j J ( θ ) = ? 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) ? ( 1 ? y i ) 1 1 ? h θ ( x i ) ) ? ? θ j h θ ( x i ) = ? 1 m ∑ i = 1 m ( y i 1 g ( θ T x i ) ? ( 1 ? y i ) 1 1 ? g ( θ T x i ) ) g ( θ T x i ) ( 1 ? g ( θ T x i ) ) x i j = ? 1 m ∑ i = 1 m ( y i ( 1 ? g ( θ T x i ) ) ? ( 1 ? y i ) g ( θ T x i ) ) x i j = 1 m ∑ i = 1 m ( h θ ( x i ) ? y i ) x i j \begin{align*} \frac{\partial}{\partial \theta_j} J(\theta) &= -\frac{1}{m}\sum_{i=1}^m \left(y_i \frac{1}{h_\theta(x_i)} - (1-y_i)\frac{1}{1-h_\theta(x_i)}\right) \frac{\partial}{\partial \theta_j} h_\theta(x_i) \\ &= -\frac{1}{m}\sum_{i=1}^m \left(y_i \frac{1}{g(\theta^T x_i)} - (1-y_i)\frac{1}{1-g(\theta^T x_i)}\right) g(\theta^T x_i)(1-g(\theta^T x_i)) x_i^j \\ &= -\frac{1}{m}\sum_{i=1}^m \left(y_i(1-g(\theta^T x_i)) - (1-y_i)g(\theta^T x_i)\right) x_i^j \\ &= \frac{1}{m}\sum_{i=1}^m (h_\theta(x_i) - y_i) x_i^j \end{align*} ?θj???J(θ)?=?m1?i=1∑m?(yi?hθ?(xi?)1??(1?yi?)1?hθ?(xi?)1?)?θj???hθ?(xi?)=?m1?i=1∑m?(yi?g(θTxi?)1??(1?yi?)1?g(θTxi?)1?)g(θTxi?)(1?g(θTxi?))xij?=?m1?i=1∑m?(yi?(1?g(θTxi?))?(1?yi?)g(θTxi?))xij?=m1?i=1∑m?(hθ?(xi?)?yi?)xij??