文章目錄
- 01 LM 算法及 Cpp 實現
- 1.1 應用
- 1.2 阻尼法推導
- 1.3 Cpp 算法實現
01 LM 算法及 Cpp 實現
1.1 應用
LM 算法用于解決非線性最小二乘問題
min ? x F ( x ) = 1 2 ∥ f ( x ) ∥ 2 2 (1) \min _x F(x)=\frac{1}{2}\|f(\boldsymbol{x})\|_2^2 \tag{1} xmin?F(x)=21?∥f(x)∥22?(1)
其中, f ( x ) f(\boldsymbol{x}) f(x) 是指各項的殘差。
LM 算法有兩種推導方式,即 阻尼法 和 置信域 法(見《十四講》),這里用阻尼法進行推導。
1.2 阻尼法推導
(1)一階泰勒展開近似
將 f ( x ) f(\boldsymbol{x}) f(x) 在 x n \boldsymbol{x_n} xn? 處一階泰勒展開(把 Δ x \Delta \boldsymbol{x} Δx 看做未知數),
f ( x n + Δ x ) ≈ f ( x n ) + J ( x n ) Δ x (2) f(\boldsymbol{x_n}+\Delta \boldsymbol{x}) \approx f(\boldsymbol{x_n})+\boldsymbol{J}\left(\boldsymbol{x}_n\right) \Delta \boldsymbol{x} \tag{2} f(xn?+Δx)≈f(xn?)+J(xn?)Δx(2)
那么問題轉化為,對于每次迭代,求解最優的 Δ x \Delta \boldsymbol{x} Δx。
Δ x ? = arg ? min ? Δ x 1 2 ∥ f ( x n ) + J n ( x n ) Δ x ∥ 2 (3) \Delta \boldsymbol{x}^*=\arg \min _{\Delta \boldsymbol{x}} \frac{1}{2}\left\|f\left(\boldsymbol{x}_{\boldsymbol{n}}\right)+\boldsymbol{J}_{\boldsymbol{n}}\left(\boldsymbol{x}_{\boldsymbol{n}}\right) \Delta \boldsymbol{x}\right\|^2 \tag{3} Δx?=argΔxmin?21?∥f(xn?)+Jn?(xn?)Δx∥2(3)
(2)加入阻尼項
Δ x ? = arg ? min ? Δ x M ( Δ x ) = 1 2 ∥ f ( x n ) + J n ( x n ) Δ x ∥ 2 + 1 2 μ Δ x T Δ x (4) \Delta \boldsymbol{x}^*=\arg \min _{\Delta \boldsymbol{x}} M(\Delta \boldsymbol{x})=\frac{1}{2}\left\|f\left(\boldsymbol{x}_{\boldsymbol{n}}\right)+\boldsymbol{J}_{\boldsymbol{n}}\left(\boldsymbol{x}_{\boldsymbol{n}}\right) \Delta \boldsymbol{x}\right\|^2+\frac{1}{2} \mu \Delta \boldsymbol{x}^T \Delta \boldsymbol{x} \tag{4} Δx?=argΔxmin?M(Δx)=21?∥f(xn?)+Jn?(xn?)Δx∥2+21?μΔxTΔx(4)
其中, μ \mu μ 為阻尼系數,阻尼項 1 2 μ Δ x T Δ x \frac{1}{2} \mu \Delta \boldsymbol{x}^T \Delta \boldsymbol{x} 21?μΔxTΔx 可以看做是對于過大的 Δ x \Delta \boldsymbol{x} Δx 的懲罰。
(3)求極值
對 Δ x \Delta \boldsymbol{x} Δx 求導,并令其等于零,
J ( x n ) T f ( x n ) + J ( x n ) T J ( x n ) Δ x + μ Δ x = 0 (5) \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)^T f\left(\boldsymbol{x}_{\boldsymbol{n}}\right)+\boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)^T \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right) \Delta \boldsymbol{x}+\mu \Delta \boldsymbol{x}=\mathbf{0} \tag{5} J(xn?)Tf(xn?)+J(xn?)TJ(xn?)Δx+μΔx=0(5)
得
Δ x ? = ? ( J ( x n ) T J ( x n ) + μ I ) ? 1 J ( x n ) T f ( x n ) (6) \Delta \boldsymbol{x}^*=-\left(\boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)^T \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)+\mu \boldsymbol{I}\right)^{-1} \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)^T f\left(\boldsymbol{x}_{\boldsymbol{n}}\right) \tag{6} Δx?=?(J(xn?)TJ(xn?)+μI)?1J(xn?)Tf(xn?)(6)
令
J ( x n ) T J ( x n ) = H J ( x n ) T f ( x n ) = g \begin{aligned} \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)^T \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right) &=\boldsymbol{H} \\ \boldsymbol{J}\left(\boldsymbol{x}_{\boldsymbol{n}}\right)^T f\left(\boldsymbol{x}_{\boldsymbol{n}}\right) &=\boldsymbol{g} \end{aligned} J(xn?)TJ(xn?)J(xn?)Tf(xn?)?=H=g?
則式(6)可寫為
Δ x ? = ? ( H + μ I ) ? 1 g (7) \Delta \boldsymbol{x}^*=-(\boldsymbol{H}+\mu \boldsymbol{I})^{-1} \boldsymbol{g} \tag{7} Δx?=?(H+μI)?1g(7)
(4)阻尼系數 μ \mu μ 的調節
定義增益系數 ρ \rho ρ
ρ = f ( x + Δ x ) ? f ( x ) J ( x ) T Δ x (8) \rho=\frac{f(\boldsymbol{x}+\boldsymbol{\Delta x})-f(\boldsymbol{x})}{\boldsymbol{J}(\boldsymbol{x})^T\Delta \boldsymbol{x}} \tag{8} ρ=J(x)TΔxf(x+Δx)?f(x)?(8)
其中,分子是實際下降的值,分母是近似下降的值。若 ρ \rho ρ 接近于 1 ,則近似效果好;若 ρ \rho ρ 太小,則說明實際減小的值遠小于近似減小的值,即近似效果較差,需縮小近似范圍,即增大阻尼系數 μ \mu μ; 若 ρ \rho ρ 太大,則說明實際減小的值大于近似減小的值,則需放大近似范圍,即減小 μ \mu μ。
兩種調節方法:
if? ρ < 0.25 μ : = μ ? 2 else?if? ρ > 0.75 μ : = μ / 3 \begin{aligned} \text { if } & \rho<0.25 \\ & \mu:=\mu * 2 \\ \text { else if } & \rho>0.75 \\ & \mu:=\mu / 3 \end{aligned} ?if??else?if??ρ<0.25μ:=μ?2ρ>0.75μ:=μ/3?
也就是, ρ < 0.25 \rho<0.25 ρ<0.25 時增大阻尼系數; ρ > 0.75 \rho>0.75 ρ>0.75 時,減小阻尼系數。
或者:
i f ρ > 0 μ : = μ ? max ? { 1 3 , 1 ? ( 2 ρ ? 1 ) 3 } ; ν : = 2 e l s e μ : = μ ? ν ; ν : = 2 ? ν \begin{aligned} if \rho>0 \\ &\mu:=\mu * \max \left\{\frac{1}{3}, 1-(2 \rho-1)^3\right\} ; \quad \nu:=2\\ else\\ &\mu:=\mu * \nu ; \quad \nu:=2 * \nu \end{aligned} ifρ>0else?μ:=μ?max{31?,1?(2ρ?1)3};ν:=2μ:=μ?ν;ν:=2?ν?
1.3 Cpp 算法實現
兩種方法差別在于 ρ \rho ρ 的分母的計算(我的更快?)。
優化目標是待定系數,把他們看做 未知數,需要計算的 step 就是這些待定系數的 step。
(1)我的方法
/*********************************************************** *
* Time: 2022/11/3
* Author: xiaocong
* Function: LM 算法** @ 解決的是最小二乘問題,也就是找到最優的系數使得殘差最小* @ obj = A * sin(Bx) + C * cos(D*x) - F* A * sin(Bx) + C * cos(D*x) 是目標函數* F 是實際值* 目標是找到使 obj 最小的系數 A B C D
***********************************************************/#include <Eigen/Dense> // 稠密矩陣
#include <Eigen/Sparse> // 稀疏矩陣
#include <iostream>
#include <cmath>using namespace std;
using namespace Eigen;const double DERIV_STEP = 1e-5;
const int MAX_INTER = 100;#define max(a, b) (((a)>(b))?(a):(b))// 定義目標函數
double func(const VectorXd input, const VectorXd &output, const VectorXd ¶ms, int objIndex)
{// obj = A * sin(Bx) + C * cos(D*x) - Fdouble x1 = params(0); // params 中存儲的是系數double x2 = params(1);double x3 = params(2);double x4 = params(3);double t = input(objIndex); // input 中存儲的是 xdouble f = output(objIndex); // output 中存儲的是實際值return x1 * sin(x2 * t) + x3 * cos(x4 * t) - f; // 返回 objIndex 項的殘差
}// 計算殘差矩陣
VectorXd objF(const VectorXd &input, const VectorXd &output, const VectorXd ¶ms)
{VectorXd obj(input.rows()); // 存儲所有殘差,也就是殘差矩陣for (int i = 0; i < input.rows(); i++)obj(i) = func(input, output, params, i);return obj; // 返回殘差矩陣
}// 殘差平方和
double Func(const VectorXd &obj)
{return obj.squaredNorm() / 2.0;
}// 求某個系數在某點的導數
double Deriv(const VectorXd &input, const VectorXd &output, int objIndex, const VectorXd ¶ms, int paraIndex)
{VectorXd para1 = params;VectorXd para2 = params;para1(paraIndex) -= DERIV_STEP;para2(paraIndex) += DERIV_STEP;double obj1 = func(input, output, para1, objIndex);double obj2 = func(input, output, para2, objIndex);return (obj2 - obj1) / (2 * DERIV_STEP); // 該點處的導數,為求雅克比矩陣做準備
}// 計算雅克比矩陣
/***************************** 我們優化的是系數 params,把他們看做未知數,分別求導,得到雅克比矩陣* 維度:(input.rows() x params.rows())* [[df/dA df/dB df/dC df/dD] <--- x1* [df/dA df/dB df/dC df/dD] <--- x2* [.......................]* [df/dA df/dB df/dC df/dD]] <--- xn****************************/
MatrixXd Jacobian(const VectorXd &input, const VectorXd &output, const VectorXd ¶ms)
{int rowNum = input.rows();int colNum = params.rows();MatrixXd Jac(rowNum, colNum);for (int i = 0; i < rowNum; i++)for (int j = 0; j < colNum; j++)Jac(i, j) = Deriv(input, output, i, params, j);return Jac;
}//求 Hessian 矩陣對角線最大值
// Hessian 矩陣:二階導數
double maxMatrixDiagonale(const MatrixXd &Hessian)
{int max = 0;for (int i = 0; i < Hessian.rows(); i++){if(Hessian(i, i) > max)max = Hessian(i, i);}return max;
}// 近似下降值
double linerDeltaL(const VectorXd &step, const MatrixXd &Jac)
{VectorXd L = Jac * step;return L.norm();
}void levenMar(const VectorXd &input, const VectorXd &output, VectorXd ¶ms)
{double epsilon = 1e-12;int iterCnt = 0; // 迭代計數double tao = 1e-3;long long v = 2;// 求初始的 uMatrixXd Jac = Jacobian(input, output, params); // 得到雅克比矩陣MatrixXd H = Jac.transpose() * Jac; // 得到 Hessian 矩陣,4x4double u = tao * maxMatrixDiagonale(H); // Hessian 矩陣對角線最大值乘 taowhile (iterCnt < MAX_INTER) // double 類型不能直接作比較{VectorXd obj = objF(input, output, params); // 誤差矩陣MatrixXd Jac = Jacobian(input, output, params); // 得到雅克比矩陣MatrixXd H = Jac.transpose() * Jac; // 得到 Hessian 矩陣,4x4VectorXd g = Jac.transpose() * obj; // 也就是 g,4x1VectorXd step = (H + u * MatrixXd::Identity(H.rows(), H.cols())).inverse() * g;if(step.norm() < epsilon)break;cout << "H:" << endl << H << endl;cout << "step: " << endl << step << endl;VectorXd paramsNew(params.rows());paramsNew = params - step; // 更新 params// 計算 params 誤差obj = objF(input, output, params);// 計算 paramsNew 誤差VectorXd obj_new = objF(input, output, paramsNew);double deltaF = Func(obj) - Func(obj_new); // 求差double deltaL = linerDeltaL(step, Jac);// 計算增益系數 rhodouble rho = deltaF / deltaL; // 實際下降值 / 近似下降值cout << "rho is; " << rho <<endl;if(rho > 0){params = paramsNew;u *= max(1.0 / 3.0, 1 - pow(2 * rho - 1, 3));v = 2;} else{u = u * v;v = v * 2;}cout << "u= " << u << "\tv= " << v << endl;iterCnt ++;cout << "Iteration " << iterCnt << " times, result is :" << endl<< params << endl;}
}int main()
{int params_num = 4;int total_data = 100;VectorXd input(total_data);VectorXd output(total_data);double A = 5, B = 1, C = 10, D = 2; // 初始化// 生成數據for (int i = 0; i < total_data; i++){double x = 20.0 * ((rand() % 1000) / 1000.0) - 10.0; // [-10, 10]double deltaY = 2.0 * (rand() % 1000) / 1000.0; // 隨機噪聲,[0, 2]double y = A * sin(B*x) + C * cos(D*x) + deltaY;input(i) = x;output(i) = y;}VectorXd params_levenMar(params_num);params_levenMar << 3.6, 1.3, 7.2, 1.7;levenMar(input, output, params_levenMar);cout << "**********************************************" << endl;cout << "Levenberg-Marquardt parameter: " << endl << params_levenMar << endl;}
輸出
Levenberg-Marquardt parameter:4.85628
0.99790410.05242.003
(2)網絡方法
/*********************************************************** *
* Time: 2022/11/2
* Author: xiaocong
* Function: LM 算法** @ 解決的是最小二乘問題,也就是找到最優的系數使得殘差最小* @ obj = A * sin(Bx) + C * cos(D*x) - F* A * sin(Bx) + C * cos(D*x) 是目標函數* F 是實際值* 目標是找到使 obj 最小的系數 A B C D
***********************************************************/#include <Eigen/Dense> // 稠密矩陣
#include <Eigen/Sparse> // 稀疏矩陣
#include <iostream>
#include <iomanip> // 控制輸入輸出格式等
#include <cmath>using namespace std;
using namespace Eigen;const double DERIV_STEP = 1e-5;
const int MAX_INTER = 100;#define max(a, b) (((a)>(b))?(a):(b))// 定義目標函數
double func(const VectorXd input, const VectorXd &output, const VectorXd ¶ms, int objIndex)
{// obj = A * sin(Bx) + C * cos(D*x) - Fdouble x1 = params(0); // params 中存儲的是系數double x2 = params(1);double x3 = params(2);double x4 = params(3);double t = input(objIndex); // input 中存儲的是 xdouble f = output(objIndex); // output 中存儲的是實際值return x1 * sin(x2 * t) + x3 * cos(x4 * t) - f; // 返回 objIndex 項的殘差
}// 計算殘差矩陣
VectorXd objF(const VectorXd &input, const VectorXd &output, const VectorXd ¶ms)
{VectorXd obj(input.rows()); // 存儲所有殘差,也就是殘差矩陣for (int i = 0; i < input.rows(); i++)obj(i) = func(input, output, params, i);return obj; // 返回殘差矩陣
}// 殘差平方和
double Func(const VectorXd &obj)
{return obj.squaredNorm() / 2.0;
}// 求某個系數在某點的導數
double Deriv(const VectorXd &input, const VectorXd &output, int objIndex, const VectorXd ¶ms, int paraIndex)
{VectorXd para1 = params;VectorXd para2 = params;para1(paraIndex) -= DERIV_STEP;para2(paraIndex) += DERIV_STEP;double obj1 = func(input, output, para1, objIndex);double obj2 = func(input, output, para2, objIndex);return (obj2 - obj1) / (2 * DERIV_STEP); // 該點處的導數,為求雅克比矩陣做準備
}// 計算雅克比矩陣
/***************************** 我們優化的是系數 params,把他們看做未知數,分別求導,得到雅克比矩陣* 維度:(input.rows() x output.rows())* [[df/dA df/dB df/dC df/dD] <--- x1* [df/dA df/dB df/dC df/dD] <--- x2* [.......................]* [df/dA df/dB df/dC df/dD]] <--- xn****************************/
MatrixXd Jacobian(const VectorXd &input, const VectorXd &output, const VectorXd ¶ms)
{int rowNum = input.rows();int colNum = params.rows();MatrixXd Jac(rowNum, colNum);for (int i = 0; i < rowNum; i++)for (int j = 0; j < colNum; j++)Jac(i, j) = Deriv(input, output, i, params, j);return Jac;
}//求 Hessian 矩陣對角線最大值
// Hessian 矩陣:二階導數
double maxMatrixDiagonale(const MatrixXd &Hessian)
{int max = 0;for (int i = 0; i < Hessian.rows(); i++){if(Hessian(i, i) > max)max = Hessian(i, i);}return max;
}double linerDeltaL(const VectorXd &step, const VectorXd &gradient, const double u)
{double L = step.transpose() * (u * step - gradient);return L;
}void levenMar(const VectorXd &input, const VectorXd &output, VectorXd ¶ms)
{int errNum = input.rows();int paraNum = params.rows();// initial parametersVectorXd obj = objF(input, output, params); // 得到誤差矩陣MatrixXd Jac = Jacobian(input, output, params); // 得到雅克比矩陣MatrixXd H = Jac.transpose() * Jac; // 得到 Hessian 矩陣,4x4VectorXd gradient = Jac.transpose() * obj; // 也就是 g,4x1double tao = 1e-3;long long v = 2;double epsilon1 = 1e-12, epsilon2 = 1e-12;double u = tao * maxMatrixDiagonale(H); // Hessian 矩陣對角線最大值乘 taobool found = gradient.norm() <= epsilon1;if (found) return; // 直接退出程序,不再執行后面的程序double last_sum = 0;int iterCnt = 0; // 迭代計數while (iterCnt < MAX_INTER){VectorXd obj = objF(input, output, params); // 誤差矩陣MatrixXd Jac = Jacobian(input, output, params); // 得到雅克比矩陣MatrixXd H = Jac.transpose() * Jac; // 得到 Hessian 矩陣,4x4VectorXd gradient = Jac.transpose() * obj; // 也就是 g,4x1if(gradient.norm() < epsilon1){cout << "stop g(x) = 0 for a local minimizer optimizer." << endl;break;}cout << "H:" << endl << H << endl;VectorXd step = (H + u * MatrixXd::Identity(paraNum, paraNum)).inverse() * gradient;// 求 Delta x = (H + uI)^{-1}g 注意:step 維度(4x1)cout << "step: " << endl << step << endl;if(step.norm() <= epsilon2 * (params.norm()) + epsilon2){cout << "stop because change in x is small" << endl;break;}VectorXd paramsNew(params.rows());paramsNew = params - step; // 更新 params// 計算 params 誤差obj = objF(input, output, params);// 計算 paramsNew 誤差VectorXd obj_new = objF(input, output, paramsNew);double deltaF = Func(obj) - Func(obj_new); // 求差double deltaL = linerDeltaL(-1 * step, gradient, u);// 計算增益系數 rhodouble rho = deltaF / deltaL; // 實際下降值 / 近似下降值cout << "rho is; " << rho <<endl;if(rho > 0){params = paramsNew;u *= max(1.0 / 3.0, 1 - pow(2 * rho - 1, 3));v = 2;} else{u = u * v;v = v * 2;}cout << "u= " << u << "\tv= " << v << endl;iterCnt ++;cout << "Iteration " << iterCnt << " times, result is :" << endl<< params << endl;}
}int main()
{int params_num = 4;int total_data = 100;VectorXd input(total_data);VectorXd output(total_data);double A = 5, B = 1, C = 10, D = 2; // 初始化// 生成數據for (int i = 0; i < total_data; i++){double x = 20.0 * ((rand() % 1000) / 1000.0) - 10.0; // [-10, 10]double deltaY = 2.0 * (rand() % 1000) / 1000.0; // 隨機噪聲,[0, 2]double y = A * sin(B*x) + C * cos(D*x) + deltaY;input(i) = x;output(i) = y;}VectorXd params_levenMar(params_num);params_levenMar << 3.6, 1.3, 7.2, 1.7;levenMar(input, output, params_levenMar);cout << "Levenberg-Marquardt parameter: " << endl << params_levenMar << endl << endl << endl;cout << "**********************************************" << endl;}