機器學習筆記【Week2】

一、多變量線性回歸（Multivariate Linear Regression）

為什么需要多變量？

現實問題中，一個目標可能受多個因素影響，比如預測房價時：

$x_1$ ：面積
$x_2$ ：臥室數量
$x_3$ ：房齡
$...$

假設函數（Hypothesis Function）

在單變量線性回歸基礎上推廣為：
$h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$
向量形式更簡潔：
$h_\theta(x) = \theta^T x$
其中：

$\theta = [\theta_0, \theta_1, \cdots, \theta_n]^T$ （參數向量）
$x_1, x_2, \cdots, x_n]^T$ （ $x_0 = 1$ 以統一偏置項）

模型核心思想：

和單變量回歸一樣，我們要最小化代價函數：
$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2$
然后通過梯度下降法或正規方程法求解。

Python 示例代碼（數據模擬）

import numpy as np# 模擬數據：面積、臥室數，房價
X = np.array([[2104, 3],[1600, 3],[2400, 3],[1416, 2],[3000, 4]])
y = np.array([399.9, 329.9, 369.0, 232.0, 539.9]).reshape(-1, 1)m = len(y)# 添加偏置項 x0 = 1
X = np.c_[np.ones((m, 1)), X]  # shape = (m, n+1)
theta = np.zeros((X.shape[1], 1))  # 初始參數

二、特征縮放（Feature Scaling）

特征數值差距大時（如面積 $[50, 200]$ vs 房齡 $[1, 30]$ ，梯度下降可能收斂非常慢，因此需要對輸入進行縮放。

方法：均值歸一化（mean normalization）

$x_i := \frac{x_i - \mu_i}{s_i}$

$\mu_i$ ：第 $i$ 個特征的平均值
$s_i$ ：標準差或最大最小差

使得所有特征都落在類似于 $[? 1, 1]$ 范圍內

Python 實現：

def feature_normalize(X):mu = np.mean(X, axis=0)sigma = np.std(X, axis=0)X_norm = (X - mu) / sigmareturn X_norm, mu, sigma# 只對 x1~xn 歸一化，排除 x0
X[:, 1:], mu, sigma = feature_normalize(X[:, 1:])

三、向量化梯度下降（Vectorized Gradient Descent）

成本函數：

$J(\theta) = \frac{1}{2m}(X\theta - y)^T(X\theta - y)$

梯度公式（向量化）：

$\theta := \theta - \frac{\alpha}{m} X^T(X\theta - y)$

其中：

$X$ 是 $\times (n+1)$ 的訓練樣本矩陣
$y$ 是 $\times 1$ 的目標值列向量

Python 實現：

def compute_cost(X, y, theta):m = len(y)return (1 / (2 * m)) * np.sum((X @ theta - y) ** 2)def gradient_descent(X, y, theta, alpha, num_iters):m = len(y)J_history = []for _ in range(num_iters):error = X @ theta - ygradient = (1 / m) * X.T @ errortheta -= alpha * gradientJ_history.append(compute_cost(X, y, theta))return theta, J_history

四、梯度下降的收斂性分析

如何判斷收斂？

繪制 $J(\theta)$ 隨迭代次數的變化圖
若代價函數持續下降 → 收斂良好
若震蕩/上升 → 學習率 $\alpha$ 太大，需調小

調整學習率建議：

現象	原因	解決方法
收斂很慢	學習率太小	增加 $\alpha$
震蕩甚至發散	學習率太大	減小 $\alpha$

五、正規方程法（Normal Equation）

不使用梯度下降，直接求解析解：

解法公式：

$\theta = (X^T X)^{-1} X^T y$

優點：

不需要選擇 $\alpha$
不需要迭代

缺點：

當特征數量 $n$ 很大時（如 >10000），求逆操作非常慢甚至不可行

Python 實現：

def normal_equation(X, y):return np.linalg.inv(X.T @ X) @ X.T @ ytheta_ne = normal_equation(X, y)

正規方程特點：

優點	缺點
不需選擇學習率	不能用于特征非常多的情況（矩陣求逆開銷大）
不需迭代，一次求解	對數據量大、特征維度高時效率較低

六、可視化訓練過程（損失下降）

import matplotlib.pyplot as plttheta, J_history = gradient_descent(X, y, theta, alpha=0.1, num_iters=400)plt.plot(J_history)
plt.xlabel("Iterations")
plt.ylabel("Cost J(θ)")
plt.title("Cost Reduction over Time")
plt.grid(True)
plt.show()

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/906958.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/906958.shtml
英文地址，請注明出處：http://en.pswp.cn/news/906958.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！