羅劍鋒的C++實戰筆記學習（二）：容器、算法庫、多線程

4、容器

1）、容器的通用特性

所有容器都具有的一個基本特性：它保存元素采用的是值（value）語義，也就是說，容器里存儲的是元素的拷貝、副本，而不是引用

容器操作元素的很大一塊成本就是值的拷貝。所以，如果元素比較大，或者非常多，那么操作時的拷貝開銷就會很高，性能也就不會太好

一個解決辦法是，盡量為元素實現轉移構造和轉移賦值函數，在加入容器的時候使用std::move()來轉移，減少元素復制的成本：

#include <iostream>
#include <vector>class Point {
public:Point(int x, int y) : x_(x), y_(y), data("Expensive resource") {std::cout << "Point constructor called." << std::endl;}Point(const Point &other) : x_(other.x_), y_(other.y_), data(other.data) {std::cout << "Copy constructor called." << std::endl;}Point(Point &&other) noexcept: x_(other.x_), y_(other.y_), data(std::move(other.data)) {std::cout << "Move constructor called." << std::endl;}private:int x_, y_;std::string data; // 假設這是一個昂貴的資源
};int main() {// 不使用移動語義,直接拷貝{Point p(1, 2);std::vector<Point> v;v.push_back(p); // 這里會調用拷貝構造函數std::cout << "---- After normal push_back ----" << std::endl;}// 使用移動語義{Point p(3, 4);std::vector<Point> v;v.push_back(std::move(p)); // 這里會調用移動構造函數std::cout << "---- After move semantics push_back ----" << std::endl;}return 0;
}

輸出：

Point constructor called.
Copy constructor called.
---- After normal push_back ----
Point constructor called.
Move constructor called.
---- After move semantics push_back ----

也可以使用C++11為容器新增加的emplace操作函數，它可以就地構造元素，免去了構造后再拷貝、轉移的成本，不但高效，而且用起來也很方便：

#include <iostream>
#include <vector>
#include <string>class Point {
public:Point(int x, int y) : x_(x), y_(y), data("Expensive resource") {std::cout << "Point constructor called." << std::endl;}Point(const Point &other) : x_(other.x_), y_(other.y_), data(other.data) {std::cout << "Copy constructor called." << std::endl;}Point(Point &&other) noexcept: x_(other.x_), y_(other.y_), data(std::move(other.data)) {std::cout << "Move constructor called." << std::endl;}private:int x_, y_;std::string data; // 假設這是一個昂貴的資源
};int main() {std::vector<Point> v;v.reserve(2);  // 預分配足夠的空間,避免內部擴展// 使用emplace_back直接在容器內構造對象v.emplace_back(1, 2); // 這里不會調用拷貝或移動構造函數v.emplace_back(3, 4);// 當std::vector需要擴容時,會觸發之前對象的移動構造函數v.emplace_back(5, 6);return 0;
}

輸出：

Point constructor called.
Point constructor called.
Point constructor called.
Move constructor called.
Move constructor called.

還可以在容器里存放元素的指針，來間接保存元素。這里建議使用智能指針unique_ptr/shared_ptr，讓它們幫你自動管理元素，一般情況下，shared_ptr是一個更好的選擇，它的共享語義與容器的值語義基本一致

2）、順序容器

順序容器就是數據結構里的線性表，一共有5種：array、vector、deque、list、forward_list

按照存儲結構，這5種容器又可以再細分成兩組

連續存儲的數組：array、vector和deque
指針結構的鏈表：list和forward_list

數組：

array和vector直接對應C的內置數組，內存布局與C完全兼容，所以是開銷最低、速度最快的容器。它們兩個的區別在于容量能否動態增長。array是靜態數組，大小在初始化的時候就固定了，不能再容納更多的元素。而vector是動態數組，雖然初始化的時候設定了大小，但可以在后面隨需增長，容納任意數量的元素

#include <array>
#include <vector>int main() {std::array<int, 2> arr;assert(arr.size() == 2);std::vector<int> v(2);for (int i = 0; i < 10; i++) {v.emplace_back(i);}assert(v.size() == 12);return 0;
}

deque也是一種可以動態增長的數組，它和vector的區別是，它可以在兩端高效地插入刪除元素，而vector則只能用push_back在末端追加元素

#include <deque>int main() {std::deque<int> d;d.emplace_back(9); // 末端添加一個元素d.emplace_front(1); // 前端添加一個元素assert(d.size() == 2);return 0;
}

鏈表：

vector和deque里的元素因為是連續存儲的，所以在中間的插入刪除效率就很低，而list和forward_list是鏈表結構，插入刪除操作只需要調整指針，所以在任意位置的操作都很高效

鏈表的缺點是查找效率低，只能沿著指針順序訪問，這方面不如vector隨機訪問的效率高。list是雙向鏈表，可以向前或者向后遍歷，而forward_list是單向鏈表，只能向前遍歷，查找效率就更低了

鏈表結構比起數組結構還有一個缺點，就是存儲成本略高，因為必須要為每個元素附加一個或者兩個的指針，指向鏈表的前后節點

擴容機制：

vector/deque和list/forward_list都可以動態增長來容納更多的元素，但它們的內部擴容機制卻是不一樣的

當vector的容量到達上限的時候（capacity），它會再分配一塊兩倍大小的新內存，然后把舊元素拷貝或者移動過去。這個操作的成本是非常大的，所以，在使用vector的時候最好能夠預估容量，使用reserve提前分配足夠的空間，減少動態擴容的拷貝代價

deque、list會按照固定的步長（例如N個字節、一個節點）去增加容量。但在短時間內插入大量數據的時候就會頻繁分配內存，效果反而不如vector一次分配來得好

如何選擇：

如果沒有什么特殊需求，首選的容器就是array和vector，它們的速度最快、開銷最低，數組的形式也令它們最容易使用，搭配算法也可以實現快速的排序和查找

剩下的deque、list和forward_list則適合對插入刪除性能比較敏感的場合，如果還很在意空間開銷，那就只能選擇非鏈表的deque了

3）、有序容器

順序容器的特點是，元素的次序是由它插入的次序而決定的，訪問元素也就按照最初插入的順序。而有序容器則不同，它的元素在插入容器后就被按照某種規則自動排序，所以是有序的

標準庫里一共有四種有序容器：set/multiset和map/multimap（底層是通過紅黑樹實現）。有multi前綴的容器表示可以容納重復的key

在定義有序容器的時候必須要指定key的比較函數。只不過這個函數通常是默認的less，表示小于關系，不用特意寫出來

C++里的int、string等基本類型都支持比較排序，但很多自定義類型沒有默認的比較函數，需要重載<或者自定義模板參數

比如說有一個Point類，它是沒有大小概念的，但只要給它重載<操作符，就可以放進有序容器里了：

#include <iostream>
#include <set>class Point {
public:Point(int x, int y) : x_(x), y_(y) {}bool operator<(const Point &other) const {if (x_ != other.x_) {return x_ < other.x_;} else {return y_ < other.y_;}}friend std::ostream &operator<<(std::ostream &os, const Point &p) {os << "(" << p.x_ << ", " << p.y_ << ")";return os;}private:int x_;int y_;
};int main() {std::set<Point> points;points.emplace(7, 2);points.emplace(3, 5);for (const auto &point: points) {std::cout << point << std::endl;}return 0;
}

另一種方式是編寫專門的函數對象或者lambda表達式，然后在容器的模板參數里指定。這種方式更靈活，而且可以實現任意的排序準則：

#include <iostream>
#include <set>template<typename Iter>
void printRangeWithCommas(Iter begin, Iter end) {if (begin == end) return;for (Iter it = begin; it != end; ++it) {std::cout << *it;if (std::next(it) != end) {std::cout << ",";}}std::cout << "\n";
}int main() {std::set<int> s = {7, 3, 9};printRangeWithCommas(s.begin(), s.end()); // 調用函數打印集合,輸出: 3,7,9auto comp = [](auto a, auto b) {return a > b;};std::set<int, decltype(comp)> gs(comp);std::copy(s.begin(), s.end(), std::inserter(gs, gs.end()));printRangeWithCommas(gs.begin(), gs.end()); // 再次調用函數打印另一個集合,輸出: 9,7,3return 0;
}

4）、無序容器

無序容器也有四種，名字里也有set和map，只是加上了unordered（無序）前綴，分別是unordered_set/unordered_multiset、unordered_map/unordered_multimap

無序容器用法上與有序容器幾乎是一樣的，區別在于內部數據結構：它不是紅黑樹，而是散列表（也叫哈希表，hash table）

因為它采用散列表存儲數據，元素的位置取決于計算的散列值，沒有規律可言，所以就是無序的

#include <iostream>
#include <unordered_map>int main() {using map_type =std::unordered_map<int, std::string>;map_type dict;dict[1] = "one";dict.emplace(2, "two");dict[10] = "ten";for (auto &x: dict) { // 遍歷順序不確定,既不是插入順序，也不是大小序std::cout << x.first << "=>"<< x.second << ",";}return 0;
}

無序容器要求key具備兩個條件，一是可以計算hash值，二是能夠執行相等比較操作。第一個是因為散列表的要求，只有計算hash值才能放入散列表，第二個則是因為hash值可能會沖突，所以當hash值相同時，就要比較真正的key值

#include <iostream>
#include <unordered_set>class Point {
public:Point(int x, int y) : x_(x), y_(y) {}int getX() const { return x_; }int getY() const { return y_; }// 重載operator==用于比較bool operator==(const Point &other) const {return x_ == other.x_ && y_ == other.y_;}// 為了使用Point作為std::unordered_set或std::unordered_map的鍵,需要定義哈希函數struct Hash {std::size_t operator()(const Point &p) const noexcept {std::size_t h1 = std::hash<int>{}(p.getX());std::size_t h2 = std::hash<int>{}(p.getY());return h1 ^ (h2 << 1);}};// 重載operator<<以便可以直接輸出Point對象friend std::ostream &operator<<(std::ostream &os, const Point &p) {return os << "(" << p.x_ << ", " << p.y_ << ")";}private:int x_, y_;
};int main() {std::unordered_set<Point, Point::Hash> pointSet;pointSet.insert(Point(1, 2));pointSet.insert(Point(3, 4));// 嘗試插入重復的點,不會成功pointSet.insert(Point(1, 2));for (const auto &point: pointSet) {std::cout << point << std::endl;}return 0;
}

5）、小結

標準容器可以分為三大類，即順序容器、有序容器和無序容器
所有容器中最優先選擇的應該是array和vector，它們的速度最快，開銷最低
list是鏈表結構，插入刪除的效率高，但查找效率低
有序容器是紅黑樹結構，對key自動排序，查找效率高，但有插入成本
無序容器是散列表結構，由hash值計算存儲位置，查找和插入的成本都很低
有序容器和無序容器都屬于關聯容器，元素有key的概念，操作元素實際上是在操作key，所以要定義對key的比較函數或者散列函數

5、算法庫

1）、迭代器

迭代器常用的函數如下：

begin()、end()：得到表示兩個端點的迭代器
distance()：計算兩個迭代器之間的距離
advance()：前進或者后退 N 步
next()、prev()：計算迭代器前后的某個位置

#include <array>int main() {std::array<int, 5> arr = {0, 1, 2, 3, 4}; // array靜態數組容器auto b = begin(arr); // 全局函數獲取迭代器,首端auto e = end(arr); // 全局函數獲取迭代器,末端assert(std::distance(b, e) == 5); // 迭代器的距離auto p = std::next(b); // 獲取下一個位置assert(std::distance(b, p) == 1); // 迭代器的距離assert(std::distance(p, b) == -1); // 反向計算迭代器的距離std::advance(p, 2); // 迭代器前進兩個位置,指向元素3assert(*p == 3);assert(p == std::prev(e, 2)); // 是末端迭代器的前兩個位置return 0;
}

2）、for_each

#include <iostream>
#include <vector>int main() {std::vector<int> v = {3, 5, 1, 7, 10};for (const auto &x: v) { // range for循環std::cout << x << ",";}std::cout << "\n";auto print = [](const auto &x) // 定義一個lambda表達式{std::cout << x << ",";};for_each(cbegin(v), cend(v), print); // for_each算法std::cout << "\n";for_each( // for_each算法,內部定義lambda表達式cbegin(v), cend(v), // 獲取常量迭代器[](const auto &x) // 匿名lambda表達式{std::cout << x << ",";});std::cout << "\n";return 0;
}

3）、排序算法

sort()是經典的快排算法，示例如下：

#include <iostream>
#include <vector>int main() {std::vector<int> v = {3, 5, 1, 7, 10};auto print = [](const auto &x) // lambda表達式輸出元素{std::cout << x << ",";};std::sort(begin(v), end(v)); // 快速排序for_each(cbegin(v), cend(v), print);return 0;
}

一些常見問題對應的算法：

要求排序后仍然保持元素的相對順序，應該用stable_sort，它是穩定的
選出前幾名（TopN），應該用partial_sort
選出前幾名，但不要求再排出名次（BestN），應該用nth_element
中位數（Median）、百分位數（Percentile），還是用nth_element
按照某種規則把元素劃分成兩組，用partition
第一名和最后一名，用minmax_element

#include <iostream>
#include <vector>template<typename Iter>
void printRangeWithCommas(const std::string &prefix, Iter begin, Iter end) {if (begin == end) return;std::cout << prefix;for (auto it = begin; it != end; ++it) {std::cout << (it == begin ? "" : ",") << *it;}std::cout << '\n';
}int main() {std::vector<int> v = {3, 5, 1, 7, 10};// top3std::partial_sort(begin(v), next(begin(v), 3), end(v)); // 取前3名printRangeWithCommas("top3: ", v.begin(), next(begin(v), 3));// best3std::nth_element(begin(v), next(begin(v), 3), end(v)); // 最好的3個printRangeWithCommas("best3: ", v.begin(), next(begin(v), 3));// medianauto mid_iter = // 中位數的位置next(begin(v), v.size() / 2);std::nth_element(begin(v), mid_iter, end(v)); // 排序得到中位數std::cout << "median: " << *mid_iter << std::endl;// partitionauto pos = std::partition( // 找出所有大于9的數begin(v), end(v), [](const auto &x) // 定義一個lambda表達式{return x > 9;});printRangeWithCommas("values > 9: ", v.begin(), pos);// min/maxauto [minIt, maxIt] = std::minmax_element( // 找出第一名和倒數第一cbegin(v), cend(v));std::cout << "min value: " << *minIt << ", max value: " << *maxIt << std::endl;return 0;
}

在使用這些排序算法時，對迭代器要求比較高，通常都是隨機訪問迭代器（minmax_element除外），所以最好在順序容器array/vector上調用

4）、查找算法

binary_search：在已經排好序的區間里執行二分查找，但它只返回一個bool值，告知元素是否存在

#include <vector>int main() {std::vector<int> v = {3, 5, 1, 7, 10, 99, 42};std::sort(begin(v), end(v)); // 快速排序// 二分查找,只能確定元素在不在bool found = binary_search(cbegin(v), cend(v), 7);assert(found);return 0;
}

想要在已序容器上執行二分查找，要用算法lower_bound，它返回第一個大于或等于值的位置

#include <vector>int main() {std::vector<int> v = {3, 5, 1, 7, 10, 99, 42};std::sort(begin(v), end(v));// 找到第一個>=7的位置auto pos = std::lower_bound(cbegin(v), cend(v), 7);bool found = (pos != cend(v)) && (*pos == 7); // 可能找不到,所以必須要判斷assert(found); // 7在容器里// 找到第一個>=9的位置pos = std::lower_bound(cbegin(v), cend(v), 9);found = (pos != cend(v)) && (*pos == 9); // 可能找不到,所以必須要判斷assert(!found); // 9不在容器里return 0;
}

lower_bound的返回值是一個迭代器，需要做判斷才能知道是否真的找到了。判斷的條件有兩個，一個是迭代器是否有效，另一個是迭代器的值是不是要找的值

upper_bound算法：返回第一個大于值的元素（也是在已序容器上執行二分查找）

對于有序容器set/map，就不需要調用這三個算法了，它們有等價的成員函數find/lower_bound/upper_bound，效果是一樣的

#include <iostream>
#include <set>int main() {std::multiset<int> s = {3, 5, 1, 7, 7, 7, 10, 99, 42};auto pos = s.find(7); // 二分查找,返回迭代器assert(pos != s.end());auto lower_pos = s.lower_bound(7); // 獲取區間的左端點auto upper_pos = s.upper_bound(7); // 獲取區間的右端點auto print = [](const auto &x) {std::cout << x << ",";};// 輸出7,7,7std::for_each(lower_pos, upper_pos, print);return 0;
}

標準庫里還有一些查找算法可以用于未排序的容器，這些算法以find和search命名，其中用于查找區間的find_first_of/find_end

#include <vector>
#include <array>int main() {std::vector<int> v = {1, 9, 11, 3, 5, 7};// 查找算法,找到第一個出現的位置auto pos = std::find(begin(v), end(v), 3);assert(pos != end(v));// 查找算法,用lambda判斷條件pos = std::find_if(begin(v), end(v),[](auto x) {return x % 2 == 0;});assert(pos == end(v));std::array<int, 2> arr = {3, 5};// 查找一個子區間pos = std::find_first_of(begin(v), end(v),begin(arr), end(arr));assert(pos != end(v));return 0;
}

5）、小結

算法是專門操作容器的函數，是一種智能for循環，它的最佳搭檔是lambda表達式
算法通過迭代器來間接操作容器，使用兩個端點指定操作范圍，迭代器決定了算法的能力
for_each算法是for的替代品，以函數式編程替代了面向過程編程
有多種排序算法，最基本的是sort，但應該根據實際情況選擇其他更合適的算法，避免浪費
在已序容器上可以執行二分查找，應該使用的算法是lower_bound
list/set/map提供了等價的排序、查找函數，更適應自己的數據結構
find/search是通用的查找算法，效率不高，但不必排序也能使用

6、多線程

1）、僅調用一次

要先聲明一個once_flag類型的變量，最好是靜態、全局的（線程可見），作為初始化的標志。然后調用專門的call_once()函數，以函數式編程的方式，傳遞這個標志和初始化函數。這樣C++就會保證，即使多個線程重入call_once()，也只能有一個線程會成功運行初始化

#include <iostream>
#include <thread>int main() {static std::once_flag flag; // 全局的初始化標志auto f = []() {std::call_once(flag, // 僅一次調用,注意要傳flag[]() { // 匿名lambda,初始化函數,只會執行一次std::cout << "only once" << std::endl;});};// 使用vector管理線程,確保所有線程執行完畢后再退出mainstd::vector<std::thread> threads;for (int i = 0; i < 2; ++i) {threads.emplace_back(f);}// 等待所有線程完成for (std::thread &t: threads) {t.join();}return 0;
}

2）、線程局部存儲

有thread_local標記的變量在每個線程里都會有一個獨立的副本，是線程獨占的

#include <iostream>
#include <thread>int main() {thread_local int n = 0; // 線程局部存儲變量auto f = [&](int x) {n += x; // 使用線程局部變量,互不影響std::cout << n << std::endl;};// 使用vector管理線程,確保所有線程執行完畢后再退出mainstd::vector<std::thread> threads;threads.emplace_back(f, 10);threads.emplace_back(f, 20);// 等待所有線程完成for (std::thread &t: threads) {t.join();}return 0;
}

在程序執行后，可以看到兩個線程分別輸出了10和20，互不干擾

3）、原子變量

目前，C++只能讓一些最基本的類型原子化，比如atomic_int、atomic_long等。這些原子變量都是模板類atomic的特化形式，包裝了原始的類型，具有相同的接口，用起來和bool、int幾乎一模一樣，但卻是原子化的，多線程讀寫不會出錯

原子變量和原始的類型一個重要的區別是：原子變量禁用了拷貝構造函數，所以在初始化的時候不能用=的賦值形式，只能用圓括號或者花括

#include <iostream>
#include <thread>void incrementAtomicInt(std::atomic_int &counter, int numIterations) {for (int i = 0; i < numIterations; ++i) {++counter;}
}void incrementAtomicLong(std::atomic_long &counter, long numIterations) {for (long i = 0; i < numIterations; ++i) {counter += 10;}
}int main() {const int numThreads = 4;const int iterationsPerThread = 100;std::atomic_int x{0};std::atomic_long y{1000L};std::vector<std::thread> threads;// 對x進行遞增操作的線程for (int i = 0; i < numThreads; ++i) {threads.emplace_back(incrementAtomicInt, std::ref(x), iterationsPerThread);}// 對y進行遞增操作的線程for (int i = 0; i < numThreads; ++i) {threads.emplace_back(incrementAtomicLong, std::ref(y), iterationsPerThread / 10); // 注意調整迭代次數以匹配期望的增量}// 等待所有線程完成for (std::thread &t: threads) {t.join();}std::cout << "final value of x: " << x << std::endl;std::cout << "final value of y: " << y << std::endl;assert(x == numThreads * iterationsPerThread);assert(y == 1000L + numThreads * (iterationsPerThread / 10) * 10);return 0;
}

除了模擬整數運算，原子變量還有一些特殊的原子操作，比如store、load、fetch_add、fetch_sub、exchange、compare_exchange_weak/compare_exchange_strong以及CAS（Compare And Swap）操作

4）、線程

C++標準庫里有專門的線程類thread，使用它就可以簡單地創建線程，在名字空間std::this_thread里，還有yield()、get_id()、sleep_for()、sleep_until()等幾個方便的管理函數

#include <iostream>
#include <chrono>
#include <thread>int main() {static std::atomic_flag flag{false};static std::atomic_int n;auto f = [&]() {auto value = flag.test_and_set(); // TAS檢查原子標志量if (value) {std::cout << "flag has been set." << std::endl;} else {std::cout << "set flag by " << std::this_thread::get_id() << std::endl; // 輸出線程id}n += 100; // 原子變量加法運算std::this_thread::sleep_for(std::chrono::milliseconds(n.load() * 10));std::cout << n << std::endl;};std::vector<std::thread> threads;for (int i = 0; i < 2; ++i) {threads.emplace_back(f);}for (std::thread &t: threads) {t.join();}return 0;
}

函數async()含義是異步運行一個任務，隱含的動作是啟動一個線程去執行，但不絕對保證立即啟動（也可以在第一個參數傳遞 std::launch::async，要求立即啟動線程）

#include <iostream>
#include <chrono>
#include <thread>
#include <future>int main() {auto task = [](auto x) {std::this_thread::sleep_for(std::chrono::milliseconds(x));std::cout << "sleep for " << x << std::endl;return x;};auto f = std::async(task, 10);// 啟動一個異步任務f.wait(); // 等待任務完成assert(f.valid());// 確實已經完成了任務std::cout << f.get() << std::endl; // 獲取任務的執行結果return 0;
}

async()會返回一個future變量，如果任務有返回值，就可以用成員函數get()獲取。不過要特別注意，get()只能調一次，再次獲取結果會發生錯誤，拋出異常 std::future_error

這里還有一個很隱蔽的坑，如果不顯式獲取async()的返回值（即future對象），它就會同步阻塞直至任務完成（由于臨時對象的析構函數）。所以，即使不關心返回值，也總要用auto來配合async()，避免同步阻塞