C++ 雙峰高斯函數擬合
- 一維高斯函數
- 二維高斯函數
- 多維高斯函數
- 一維雙峰高斯函數
- 代碼實現
- 二維雙峰高斯函數
- 代碼實現
- 多維多峰高斯函數
在數據分析與清洗中經常遇到這樣的數據:數據不僅僅向單個中心靠攏,而是類似分段的向兩個甚至多個中心靠攏。數據向單個中心靠攏時,可以簡單地用高斯函數去擬合,得到相關參數,那么可以通過3sigma準則對數據進行去噪以為后續處理做準備。當數據向兩個中心靠攏時,單峰高斯函數就不再適用了,這時候就需要雙峰高斯函數出馬了。
高斯函數(Gaussian Function)是一種常見的數學函數,廣泛應用于概率論、統計學、信號處理、圖像處理等領域。
一維高斯函數
一維高斯函數的公式為:
f ( x ) = A ? exp ? ( ? ( x ? μ ) 2 2 σ 2 ) f(x)=A\cdot\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) f(x)=A?exp(?2σ2(x?μ)2?)
參數說明:
- A A A:振幅(Amplitude),控制函數的最大值。
- μ \mu μ:均值(Mean),表示函數的中心位置(即峰值所在的位置)。
- σ \sigma σ:標準差(Standard Deviation),控制函數的寬度(越小越尖銳,越大越平緩)。
- x x x:自變量。
歸一化形式:
如果需要將高斯函數歸一化(使其在 ( ? ∞ -\infty ?∞) 到 ( + ∞ +\infty +∞) 的積分等于 1),則公式變為:
f ( x ) = 1 2 π σ ? exp ? ( ? ( x ? μ ) 2 2 σ 2 ) f(x)=\frac{1}{\sqrt{2\pi}\sigma}\cdot\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) f(x)=2π?σ1??exp(?2σ2(x?μ)2?)
此時,( A = \frac{1}{\sqrt{2\pi}\sigma} )。
二維高斯函數
二維高斯函數通常用于描述二維空間中的分布,例如圖像處理中的模糊濾波器。其公式為:
f ( x , y ) = A ? exp ? ( ? ( x ? x 0 ) 2 2 σ x 2 ? ( y ? y 0 ) 2 2 σ y 2 ) f(x,y)=A\cdot\exp\left(-\frac{(x-x_0)^2}{2\sigma_x^2}-\frac{(y-y_0)^2}{2\sigma_y^2}\right) f(x,y)=A?exp(?2σx2?(x?x0?)2??2σy2?(y?y0?)2?)
各向同性高斯函數
當 ( \sigma_x = \sigma_y = \sigma ) 時,二維高斯函數變為:
f ( x , y ) = A ? exp ? ( ? ( x ? x 0 ) 2 + ( y ? y 0 ) 2 2 σ 2 ) f(x,y)=A\cdot\exp\left(-\frac{(x-x_0)^2+(y-y_0)^2}{2\sigma^2}\right) f(x,y)=A?exp(?2σ2(x?x0?)2+(y?y0?)2?)
多維高斯函數
對于 n 維高斯函數,其通用形式為:
f ( x ) = A ? exp ? ( ? 1 2 ( x ? μ ) ? Σ ? 1 ( x ? μ ) ) f(\mathbf{x}) = A \cdot \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})\right) f(x)=A?exp(?21?(x?μ)?Σ?1(x?μ))
參數說明:
- x \mathbf{x} x:( n )-維向量,表示自變量。
- μ \boldsymbol{\mu} μ:( n )-維向量,表示均值。
- Σ \Sigma Σ:協方差矩陣,控制分布的形狀和方向。
- Σ ? 1 \Sigma^{-1} Σ?1:協方差矩陣的逆矩陣。
各向同性多維高斯函數
當協方差矩陣為對角矩陣且所有對角元素相等時(即 Σ = σ 2 I \Sigma = \sigma^2 I Σ=σ2I,其中 I I I 是單位矩陣),公式簡化為:
f ( x ) = A ? exp ? ( ? ∥ x ? μ ∥ 2 2 σ 2 ) f(\mathbf{x}) = A \cdot \exp\left(-\frac{\|\mathbf{x} - \boldsymbol{\mu}\|^2}{2\sigma^2}\right) f(x)=A?exp(?2σ2∥x?μ∥2?)
其中 ∥ x ? μ ∥ \|\mathbf{x} - \boldsymbol{\mu}\| ∥x?μ∥ 表示歐幾里得距離。
一維雙峰高斯函數
對于雙峰高斯函數,我們可以將其視為兩個這樣的高斯函數的疊加,每個都有自己的振幅,均值及標準差。因此,雙峰高斯函數的形式可以表示為:
f ( x ) = A 1 exp ? ( ? ( x ? μ 1 ) 2 2 σ 1 2 ) + A 2 exp ? ( ? ( x ? μ 2 ) 2 2 σ 2 2 ) f(x) = A_1 \exp\left(-\frac{(x-\mu_1)^2}{2\sigma_1^2}\right) + A_2 \exp\left(-\frac{(x-\mu_2)^2}{2\sigma_2^2}\right) f(x)=A1?exp(?2σ12?(x?μ1?)2?)+A2?exp(?2σ22?(x?μ2?)2?)
可以用于擬合或描述具有雙峰特性的數據集。
代碼實現
這里基于ceres,在C++中對雙峰高斯擬合進行實現。
思路很簡單,輸入一維的數據和直方圖的寬度,構建直方圖,然后通過直方圖的頂端坐標來擬合雙峰高斯函數,稍微麻煩的是找到合適的初始值:
- 建直方圖
- 確定初值
- 優化求解
struct BimodalGaussianResidual
{BimodalGaussianResidual(double x, double y) : x_(x), y_(y){}template <typename T>bool operator()(const T* const params, T* residual) const{const T A1 = params[0];const T mu1 = params[1];const T sigma1 = params[2];const T A2 = params[3];const T mu2 = params[4];const T sigma2 = params[5];T f1 = A1 * ceres::exp(-(x_ - mu1) * (x_ - mu1) / (T(2) * sigma1 * sigma1));T f2 = A2 * ceres::exp(-(x_ - mu2) * (x_ - mu2) / (T(2) * sigma2 * sigma2));residual[0] = y_ - (f1 + f2);return true;}private:const double x_;const double y_;
};bool FitBimodalGaussian(std::vector<float>& data, double* params, const double bin_width)
{//==========================建直方圖==========================double min_val = *std::min_element(data.begin(), data.end());double max_val = *std::max_element(data.begin(), data.end());std::map<double, int> histogram;for (double value : data){double bin = std::floor((value - min_val) / bin_width) * bin_width; // 找到所屬的binhistogram[bin]++;}std::vector<double> x_data;std::vector<double> heights;for (const auto& entry : histogram){x_data.push_back(entry.first + min_val + 0.5 * bin_width);heights.push_back(entry.second);}//==========================確定初值==========================// 找到直方圖中的兩個峰值double max_height1 = 0, max_height2 = 0;double mu1 = 0, mu2 = 0;size_t peak1_index = 0; // 第一個峰值的索引// 找第一個峰值for (size_t i = 0; i < heights.size(); ++i){if (heights[i] > max_height1){max_height1 = heights[i];mu1 = x_data[i];peak1_index = i;}}// 找第二個峰值(避開第一個峰值附近的區域)const double separation_threshold = 5; // 峰值之間的最小距離 // 根據數據確定for (size_t i = 0; i < heights.size(); ++i){if (std::abs(x_data[i] - mu1) > separation_threshold && heights[i] > max_height2){max_height2 = heights[i];mu2 = x_data[i];}}// 估計標準差 sigma1 和 sigma2double sigma1 = 0.5, sigma2 = 0.5; // 初始猜測值const double half_max_height1 = max_height1 / 2.0;const double half_max_height2 = max_height2 / 2.0;// 找第一個峰值的半高寬double left_edge1 = mu1, right_edge1 = mu1;for (size_t i = 0; i < x_data.size(); ++i){if (x_data[i] < mu1 && heights[i] >= half_max_height1){left_edge1 = x_data[i];}if (x_data[i] > mu1 && heights[i] >= half_max_height1){right_edge1 = x_data[i];break;}}sigma1 = (right_edge1 - left_edge1) / (2 * std::sqrt(2 * std::log(2)));// 找第二個峰值的半高寬double left_edge2 = mu2, right_edge2 = mu2;for (size_t i = 0; i < x_data.size(); ++i){if (x_data[i] < mu2 && heights[i] >= half_max_height2){left_edge2 = x_data[i];}if (x_data[i] > mu2 && heights[i] >= half_max_height2){right_edge2 = x_data[i];break;}}sigma2 = (right_edge2 - left_edge2) / (2 * std::sqrt(2 * std::log(2)));// 設置初始參數 [A1, mu1, sigma1, A2, mu2, sigma2]params[0] = max_height1;params[1] = mu1;params[2] = sigma1;params[3] = max_height2;params[4] = mu2;params[5] = sigma2;#if MY_LOGstd::cout << std::fixed << std::setprecision(6);std::cout << "\nInitial Parameters:" << std::endl;std::cout << "A1 = " << params[0] << ", mu1 = " << params[1] << ", sigma1 = " << params[2] << std::endl;std::cout << "A2 = " << params[3] << ", mu2 = " << params[4] << ", sigma2 = " << params[5] << std::endl;
#endif//==========================優化求解==========================ceres::Problem problem;for (size_t i = 0; i < x_data.size(); ++i){problem.AddResidualBlock(new ceres::AutoDiffCostFunction<BimodalGaussianResidual, 1, 6>(new BimodalGaussianResidual(x_data[i], heights[i])), nullptr, params);}ceres::Solver::Options options;options.linear_solver_type = ceres::DENSE_QR;options.minimizer_progress_to_stdout = false;ceres::Solver::Summary summary;ceres::Solve(options, &problem, &summary);#if MY_LOGstd::cout << "\nFinal Parameters:\n";std::cout << "A1: " << params[0] << ", mu1: " << params[1] << ", sigma1: " << params[2] << "\n";std::cout << "A2: " << params[3] << ", mu2: " << params[4] << ", sigma2: " << params[5] << "\n";
#endifreturn true;
}
用一組真實數據測試,可得:
Initial Parameters:
A1 = 16223.000000, mu1 = -1.018652, sigma1 = 0.084932
A2 = 2868.000000, mu2 = 7.181348, sigma2 = 0.084932Final Parameters:
A1: 15804.267874, mu1: -1.138264, sigma1: 0.744859
A2: 2530.132897, mu2: 6.820187, sigma2: 0.759935
二維雙峰高斯函數
類似的,二維雙峰高斯函數可表達式如下:
f ( x , y ) = A 1 exp ? ( ? ( x ? μ 1 x ) 2 2 σ 1 x 2 ? ( y ? μ 1 y ) 2 2 σ 1 y 2 ) + A 2 exp ? ( ? ( x ? μ 2 x ) 2 2 σ 2 x 2 ? ( y ? μ 2 y ) 2 2 σ 2 y 2 ) f(x, y) = A_1 \exp\left(-\frac{(x-\mu_{1x})^2}{2\sigma_{1x}^2} - \frac{(y-\mu_{1y})^2}{2\sigma_{1y}^2}\right) + A_2 \exp\left(-\frac{(x-\mu_{2x})^2}{2\sigma_{2x}^2} - \frac{(y-\mu_{2y})^2}{2\sigma_{2y}^2}\right) f(x,y)=A1?exp(?2σ1x2?(x?μ1x?)2??2σ1y2?(y?μ1y?)2?)+A2?exp(?2σ2x2?(x?μ2x?)2??2σ2y2?(y?μ2y?)2?)
代碼實現
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <map>
#include <iomanip>
#include "ceres/ceres.h"
#include "gnuplot-iostream.h"// 二維雙峰高斯殘差函數
struct BimodalGaussianResidual2D
{BimodalGaussianResidual2D(double x, double y, double z) : x_(x), y_(y), z_(z){}template <typename T>bool operator()(const T* const params, T* residual) const{const T A1 = params[0];const T mu1_x = params[1];const T mu1_y = params[2];const T sigma1_x = params[3];const T sigma1_y = params[4];const T A2 = params[5];const T mu2_x = params[6];const T mu2_y = params[7];const T sigma2_x = params[8];const T sigma2_y = params[9];T f1 = A1 * ceres::exp(-(T(x_ - mu1_x) * T(x_ - mu1_x) / (T(2) * sigma1_x * sigma1_x) +T(y_ - mu1_y) * T(y_ - mu1_y) / (T(2) * sigma1_y * sigma1_y)));T f2 = A2 * ceres::exp(-(T(x_ - mu2_x) * T(x_ - mu2_x) / (T(2) * sigma2_x * sigma2_x) +T(y_ - mu2_y) * T(y_ - mu2_y) / (T(2) * sigma2_y * sigma2_y)));residual[0] = z_ - (f1 + f2);return true;}private:const double x_;const double y_;const double z_;
};// 二維雙峰高斯數據生成
void GenerateBimodalGaussianData(std::vector<double>& x_data, std::vector<double>& y_data, std::vector<double>& z_data,double A1, double mu1_x, double mu1_y, double sigma1_x, double sigma1_y,double A2, double mu2_x, double mu2_y, double sigma2_x, double sigma2_y,int num_points, double noise_level)
{std::default_random_engine generator;std::normal_distribution<double> distribution(0.0, noise_level);for (int i = 0; i < num_points; ++i){double x = static_cast<double>(rand()) / RAND_MAX * 10.0 - 5.0; // 范圍 [-5, 5]double y = static_cast<double>(rand()) / RAND_MAX * 10.0 - 5.0; // 范圍 [-5, 5]double f1 = A1 * exp(-(pow(x - mu1_x, 2) / (2 * pow(sigma1_x, 2)) +pow(y - mu1_y, 2) / (2 * pow(sigma1_y, 2))));double f2 = A2 * exp(-(pow(x - mu2_x, 2) / (2 * pow(sigma2_x, 2)) +pow(y - mu2_y, 2) / (2 * pow(sigma2_y, 2))));double z = f1 + f2 + distribution(generator); // 添加噪聲x_data.push_back(x);y_data.push_back(y);z_data.push_back(z);}
}// 二維雙峰高斯函數擬合
bool FitBimodalGaussian2D(const std::vector<double>& x_data, const std::vector<double>& y_data, const std::vector<double>& z_data, double* params)
{ceres::Problem problem;for (size_t i = 0; i < x_data.size(); ++i){problem.AddResidualBlock(new ceres::AutoDiffCostFunction<BimodalGaussianResidual2D, 1, 10>(new BimodalGaussianResidual2D(x_data[i], y_data[i], z_data[i])),nullptr, params);}ceres::Solver::Options options;options.linear_solver_type = ceres::DENSE_QR;options.minimizer_progress_to_stdout = false;ceres::Solver::Summary summary;ceres::Solve(options, &problem, &summary);return true;
}int main()
{//==========================數據生成==========================std::vector<double> x_data, y_data, z_data;int num_points = 1000;double noise_level = 0.2;double A1_true = -3.0, mu1_x_true = -2.0, mu1_y_true = 2.0, sigma1_x_true = 1.0, sigma1_y_true = 1.0;double A2_true = 5.0, mu2_x_true = 2.0, mu2_y_true = -2.0, sigma2_x_true = 1.5, sigma2_y_true = 1.5;GenerateBimodalGaussianData(x_data, y_data, z_data,A1_true, mu1_x_true, mu1_y_true, sigma1_x_true, sigma1_y_true,A2_true, mu2_x_true, mu2_y_true, sigma2_x_true, sigma2_y_true,num_points, noise_level);//==========================確定初值==========================double initial_params[] = { A1_true, mu1_x_true, mu1_y_true, sigma1_x_true, sigma1_y_true, A2_true , mu2_x_true , mu2_y_true, sigma2_x_true, sigma2_y_true };// 添加隨機噪聲std::random_device rd;std::mt19937 gen(rd());std::normal_distribution<> d(0, 2*noise_level);for (int i = 0; i < 10;++i)initial_params[i]+= d(gen);//==========================優化求解==========================std::cout << "\nInitial Parameters:\n";std::cout << "A1: " << initial_params[0] << ", mu1_x: " << initial_params[1] << ", mu1_y: " << initial_params[2] << ", sigma1_x: " << initial_params[3] << ", sigma1_y: " << initial_params[4] << "\n";std::cout << "A2: " << initial_params[5] << ", mu2_x: " << initial_params[6] << ", mu2_y: " << initial_params[7] << ", sigma2_x: " << initial_params[8] << ", sigma2_y: " << initial_params[9] << "\n";FitBimodalGaussian2D(x_data, y_data, z_data, initial_params);std::cout << "\nFinal Fitted Parameters:\n";std::cout << "A1: " << initial_params[0] << ", mu1_x: " << initial_params[1] << ", mu1_y: " << initial_params[2] << ", sigma1_x: " << initial_params[3] << ", sigma1_y: " << initial_params[4] << "\n";std::cout << "A2: " << initial_params[5] << ", mu2_x: " << initial_params[6] << ", mu2_y: " << initial_params[7] << ", sigma2_x: " << initial_params[8] << ", sigma2_y: " << initial_params[9] << "\n";//==========================圖形繪制==========================// 將數據傳遞給 Gnuplot 并繪制std::vector<std::tuple<double, double, double>> data;for (size_t i = 0; i < x_data.size(); ++i){data.emplace_back(x_data[i], y_data[i], z_data[i]);}auto format_param = [](double value){std::ostringstream oss;oss << std::fixed << std::setprecision(3) << value;return oss.str();};Gnuplot gp;// 設置繪圖標題和坐標軸標簽gp << "set terminal wxt enhanced\n"; // 使用支持交互的終端gp << "set mouse\n"; // 啟用鼠標交互gp << "set title 'BimodalGaussian2D Fit'\n";gp << "set xlabel 'X'\n";gp << "set ylabel 'Y'\n";gp << "set zlabel 'Z'\n";// 設置繪圖為三維模式gp << "set style data points\n"; // 使用點樣式gp << "set ticslevel 0\n"; // 調整 Z 軸刻度位置gp << "set isosamples 100,100\n"; // 設置采樣數量gp << "set hidden3d\n"; // 隱藏被遮擋區域gp << "splot '-' with points pointtype 7 pointsize 0.5 title 'Scatter Points', "<< format_param(initial_params[0]) << "*exp(-((x - " << format_param(initial_params[1])<< ")**2/(2*" << format_param(initial_params[3]) << "**2) + (y - " << format_param(initial_params[2])<< ")**2/(2*" << format_param(initial_params[4]) << "**2))) "<< "+ " << format_param(initial_params[5]) << "*exp(-((x - " << format_param(initial_params[6])<< ")**2/(2*" << format_param(initial_params[8]) << "**2) + (y - " << format_param(initial_params[7])<< ")**2/(2*" << format_param(initial_params[9]) << "**2)))\n";gp.send(data);gp.flush();return 0;
}
在真值基礎上添加2倍隨機噪聲的初值及擬合結果如下所示,可以看到其擬合得相當準確的。
Initial Parameters:
A1: -3.11395, mu1_x: -2.43491, mu1_y: 1.9397, sigma1_x: 1.21568, sigma1_y: 1.3008
A2: 4.93584, mu2_x: 2.1473, mu2_y: -1.74558, sigma2_x: 1.88884, sigma2_y: 1.73732Final Fitted Parameters:
A1: -3.00556, mu1_x: -2.01663, mu1_y: 1.98635, sigma1_x: 0.96315, sigma1_y: 1.01506
A2: 4.95913, mu2_x: 2.01769, mu2_y: -2.01035, sigma2_x: 1.49838, sigma2_y: 1.49798
多維多峰高斯函數
同樣,理論上也可以發散到多維多峰高斯函數的情形,不過一方面這種情況比較少見,另一方面高維的初值一般不好自動確定,可能應用得很少,這里不再作說明和演示。
補充
對ceres擬合感興趣可移步C++ 帶約束的Ceres形狀擬合。
對gnuplot繪制感興趣可移步如何在C++中優雅地繪制圖表。
打完收工。