青少年編程與數學 02-018 C++數據結構與算法 21課題、機器學習與人工智能算法

一、線性回歸算法
二、邏輯回歸算法
三、K近鄰算法（K-Nearest Neighbors, KNN）
四、決策樹算法
五、支持向量機（SVM）
六、神經網絡算法
七、聚類算法
八、降維算法
- - 主成分分析（PCA）
九、總結

課題摘要
機器學習和人工智能是計算機科學中非常活躍的領域，涵蓋了從簡單的數據擬合到復雜的智能系統設計的各種算法。

一、線性回歸算法

線性回歸是一種預測連續值的監督學習算法，用于擬合數據點之間的線性關系。

線性回歸的目標是找到一個線性函數，使得預測值與真實值之間的誤差最小。通常使用最小二乘法來求解。

示例代碼：

#include <iostream>
#include <vector>
#include <Eigen/Dense> // 使用Eigen庫進行矩陣運算using namespace std;
using namespace Eigen;VectorXd linear_regression(const MatrixXd& X, const VectorXd& y) {// 添加偏置項MatrixXd X_b = MatrixXd::Ones(X.rows(), 1);X_b.rightCols(X.cols()) = X;// 計算參數VectorXd theta = (X_b.transpose() * X_b).inverse() * X_b.transpose() * y;return theta;
}int main() {// 示例數據MatrixXd X(5, 1);X << 1, 2, 3, 4, 5;VectorXd y(5);y << 2, 4, 6, 8, 10;VectorXd theta = linear_regression(X, y);cout << "參數: " << endl << theta << endl;return 0;
}

二、邏輯回歸算法

邏輯回歸是一種分類算法，用于預測離散值。它通過Sigmoid函數將線性回歸的輸出映射到0和1之間。

邏輯回歸的目標是找到一個Sigmoid函數，使得預測值與真實值之間的誤差最小。通常使用梯度下降法來求解。

示例代碼：

#include <iostream>
#include <vector>
#include <Eigen/Dense>
#include <cmath>using namespace std;
using namespace Eigen;VectorXd sigmoid(const VectorXd& z) {VectorXd result = z.unaryExpr([](double x) { return 1.0 / (1.0 + exp(-x)); });return result;
}VectorXd logistic_regression(const MatrixXd& X, const VectorXd& y, double learning_rate = 0.01, int num_iterations = 1000) {int m = X.rows();int n = X.cols();VectorXd theta(n);theta.setZero();for (int i = 0; i < num_iterations; ++i) {VectorXd z = X * theta;VectorXd h = sigmoid(z);VectorXd gradient = X.transpose() * (h - y) / m;theta -= learning_rate * gradient;}return theta;
}int main() {// 示例數據MatrixXd X(4, 2);X << 1, 2, 2, 3, 3, 4, 4, 5;VectorXd y(4);y << 0, 0, 1, 1;VectorXd theta = logistic_regression(X, y);cout << "參數: " << endl << theta << endl;return 0;
}

三、K近鄰算法（K-Nearest Neighbors, KNN）

K近鄰算法是一種簡單的分類和回歸算法，它通過找到最近的K個鄰居來預測新數據點的類別或值。

K近鄰算法的目標是找到與新數據點最近的K個數據點，并根據這些鄰居的類別或值來預測新數據點的類別或值。

示例代碼：

#include <iostream>
#include <vector>
#include <cmath>
#include <algorithm>
#include <unordered_map>using namespace std;int knn(const vector<vector<double>>& X_train, const vector<int>& y_train, const vector<double>& X_test, int k = 3) {vector<pair<double, int>> distances;for (size_t i = 0; i < X_train.size(); ++i) {double distance = 0.0;for (size_t j = 0; j < X_train[i].size(); ++j) {distance += pow(X_train[i][j] - X_test[j], 2);}distance = sqrt(distance);distances.push_back({distance, y_train[i]});}sort(distances.begin(), distances.end());unordered_map<int, int> label_count;for (int i = 0; i < k; ++i) {++label_count[distances[i].second];}int most_common_label = -1;int max_count = 0;for (const auto& pair : label_count) {if (pair.second > max_count) {max_count = pair.second;most_common_label = pair.first;}}return most_common_label;
}int main() {// 示例數據vector<vector<double>> X_train = {{1, 2}, {2, 3}, {3, 4}, {4, 5}};vector<int> y_train = {0, 0, 1, 1};vector<double> X_test = {2.5, 3.5};int prediction = knn(X_train, y_train, X_test);cout << "預測類別: " << prediction << endl;return 0;
}

四、決策樹算法

決策樹是一種基于樹結構的分類和回歸算法，它通過一系列的決策規則來預測新數據點的類別或值。

決策樹的目標是通過分裂數據集來構建一棵樹，使得每個葉子節點代表一個類別或值。常用的分裂標準包括信息增益和基尼不純度。

示例代碼：

// C++中使用決策樹算法通常需要借助一些庫，如mlpack等，這里僅給出一個簡單的框架示意#include <iostream>
#include <vector>
#include <mlpack/core.hpp>
#include <mlpack/methods/decision_tree/decision_tree.hpp>using namespace std;
using namespace mlpack;int main() {// 示例數據arma::mat X = {{1, 2}, {2, 3}, {3, 4}, {4, 5}};arma::Row<size_t> y = {0, 0, 1, 1};// 構建決策樹模型tree::DecisionTree<> clf(X, y, 2);// 預測新數據點arma::mat X_test = {{2.5, 3.5}};size_t prediction = clf.Classify(X_test);cout << "預測類別: " << prediction << endl;return 0;
}

五、支持向量機（SVM）

支持向量機是一種強大的分類算法，它通過找到一個最優超平面來分割不同類別的數據點。

支持向量機的目標是找到一個超平面，使得不同類別的數據點之間的間隔最大。常用的核函數包括線性核、多項式核和徑向基核。

示例代碼：

#include <iostream>
#include <vector>
#include <mlpack/core.hpp>
#include <mlpack/methods/svm/svm.hpp>using namespace std;
using namespace mlpack;int main() {// 示例數據arma::mat X = {{1, 2}, {2, 3}, {3, 4}, {4, 5}};arma::Row<size_t> y = {0, 0, 1, 1};// 構建SVM模型svm::SVM<kernel::LinearKernel> clf;clf.Train(X, y);// 預測新數據點arma::mat X_test = {{2.5, 3.5}};size_t prediction = clf.Classify(X_test);cout << "預測類別: " << prediction << endl;return 0;
}

六、神經網絡算法

神經網絡是一種模擬人腦神經元的計算模型，它通過多層的神經元來學習數據中的復雜模式。

神經網絡的目標是通過訓練數據來調整神經元之間的權重，使得網絡的輸出與真實值之間的誤差最小。常用的訓練算法包括反向傳播和梯度下降。

示例代碼：

#include <iostream>
#include <vector>
#include <mlpack/core.hpp>
#include <mlpack/methods/ann/ann.hpp>using namespace std;
using namespace mlpack;int main() {// 示例數據arma::mat X = {{1, 2}, {2, 3}, {3, 4}, {4, 5}};arma::Row<size_t> y = {0, 0, 1, 1};// 構建神經網絡模型ann::FFN<ann::MeanSquaredError<>, ann::RandomInitialization> clf;clf.Add<ann::Linear<>>(2, 5);clf.Add<ann::LogisticSigmoid<>>();clf.Add<ann::Linear<>>(5, 1);clf.Add<ann::LogisticSigmoid<>>();clf.Train(X, y);// 預測新數據點arma::mat X_test = {{2.5, 3.5}};arma::mat prediction;clf.Classify(X_test, prediction);cout << "預測類別: " << prediction(0) << endl;return 0;
}

七、聚類算法

聚類算法是一種無監督學習算法，它將數據點分組成多個簇，使得同一簇內的數據點相似度高，不同簇內的數據點相似度低。

K均值聚類算法的目標是將數據點分成K個簇，使得每個簇內的數據點到簇中心的距離最小。

示例代碼：

#include <iostream>
#include <vector>
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans/kmeans.hpp>using namespace std;
using namespace mlpack;int main() {// 示例數據arma::mat X = {{1, 2}, {2, 3}, {3, 4}, {4, 5}, {5, 6}};// 構建K均值聚類模型size_t k = 2;arma::Row<size_t> assignments;mlpack::kmeans::KMeans<> kmeans(X, k);kmeans.Cluster(assignments);cout << "簇標簽: " << assignments.t() << endl;return 0;
}

八、降維算法

降維算法是一種用于減少數據特征維度的算法，它通過提取數據中的主要特征來降低計算復雜度。

主成分分析（PCA）

主成分分析是一種常用的降維算法，它通過線性變換將數據投影到新的坐標系中，使得數據的方差最大化。

示例代碼：

#include <iostream>
#include <vector>
#include <mlpack/core.hpp>
#include <mlpack/methods/pca/pca.hpp>using namespace std;
using namespace mlpack;int main() {// 示例數據arma::mat X = {{1, 2, 3}, {2, 3, 4}, {3, 4, 5}, {4, 5, 6}};// 構建PCA模型mlpack::pca::PCA<> pca(X);arma::mat X_pca;pca.Apply(X, X_pca);cout << "降維后的數據: " << endl << X_pca << endl;return 0;
}