關于有害的過度使用 std::move

翻譯：2023 11 月 24 日On harmful overuse of std::move

cppreference std::move

論 std::move 的有害過度使用 - The Old New Thing

C++ 的 std::move 函數將其參數轉換為右值引用，這使得其內容可以被另一個操作“消費”（移動）。但是，在你為這個新表達能力興奮不已時，請注意不要過度使用它。

std::string get_name(int id) {std::string name = std::to_string(id);/* 假設這里進行了其他計算 */return std::move(name); // 過度使用 move 的錯誤示范
}

你可能認為你在通過說“嘿，你看，我在之后不會使用我的局部變量 name 了，所以你可以直接把字符串移動到返回值里”來給編譯器一些幫助。不幸的是，你的“幫助”實際上造成了傷害。

添加 std::move 會導致返回語句不能滿足復制省略（Copy Elision）（通常稱為命名返回值優化，NRVO）的條件：返回的東西必須是與函數返回值類型相同的局部變量的名稱。添加的 std::move 阻止了 NRVO，返回值是通過移動構造函數從 name 變量構造的。

std::string get_name(int id) {std::string name = std::to_string(id);/* 假設這里進行了其他計算 */return name; // 正確方式：允許 NRVO
}

這次，我們直接返回 name，編譯器現在可以省略拷貝，直接將 name 變量放入返回值槽中，無需拷貝。（編譯器被允許但不強制進行此優化；但在實踐中，如果所有代碼路徑都返回同一個局部變量，所有編譯器都會這樣做。）

過度熱衷使用 std::move 的另一半問題發生在接收端。

extern void report_name(std::string name);void sample1() {std::string name = std::move(get_name()); // 過度使用 move 的錯誤示范
}void sample2() {report_name(std::move(get_name())); // 過度使用 move 的錯誤示范
}

在這兩個示例函數中，我們獲取 get_name() 的返回值，并顯式地 std::move 它到一個新的局部變量或函數參數中。這是另一個試圖幫忙卻最終幫倒忙的例子。

從一個匹配類型的值構造一個值（無論是局部變量還是函數參數）會被省略：匹配的值被直接存儲到局部變量或參數中，無需拷貝。但添加 std::move 阻止了此優化發生，該值將通過移動構造。

extern void report_name(std::string name);void sample1() {std::string name = get_name(); // 正確方式：允許初始化省略
}void sample2() {report_name(get_name()); // 正確方式：允許參數初始化省略
}

特別“精彩”的是當你把兩個錯誤結合在一起時。那樣的話，你把一個本來完全沒有拷貝或移動操作的序列，變成了一個創建了兩個額外的臨時對象、兩次額外的移動操作和兩次額外的析構操作的序列。

#include <memory>
struct S {S();S(S const&);S(S &&);~S();
};
extern void consume(S s);// 錯誤版本
S __declspec(noinline) f1() {S s;return std::move(s); // 錯誤 1：阻止 NRVO
}void g1() {consume(std::move(f1())); // 錯誤 2：阻止初始化省略
}

（展示MSVC 為錯誤版本 f1/g1 和正確版本 f2/g2 生成的匯編代碼，清晰地證明了錯誤版本進行了額外的移動構造和臨時對象操作，而正確版本利用 NRVO 和初始化省略實現了零拷貝/移動。）

以下是 msvc 的編譯器輸出：

; on entry, rcx says where to put the return value在入口處，rcx指出將返回值放在何處。
f1:mov     qword ptr [rsp+8], rcxpush    rbxsub     rsp, 48mov     rbx, rcx; construct local variable s on stack在堆棧上構造局部變量slea     rcx, qword ptr [rsp+64]call    S::S(); copy local variable to return value復制局部變量到返回值lea     rdx, qword ptr [rsp+64]mov     rcx, rbxcall    S::S(S &&); destruct the local variable s析構局部變量 slea     rcx, qword ptr [rsp+64]call    S::~S(); return the result返回結果mov     rax, rbxadd     rsp, 48pop     rbxretg1:sub     rsp, 40; call f1 and store into temporary variable調用f1并存儲到臨時變量中lea     rcx, qword ptr [rsp+56]call    f1(); copy temporary to outbound parameter復制臨時到出站參數mov     rdx, raxlea     rcx, qword ptr [rsp+48]call    S::S(S &&); call consume with the outbound parameter使用出站參數調用消費mov     rcx, raxcall    consume(S); clean up the temporary清理臨時的lea     rcx, qword ptr [rsp+56]call    S::~S(); returnadd     rsp, 40ret

請注意，調用 g1 會導致總共創建兩個額外的 S 副本，一個在 f1 中，另一個用于保存 f1 的返回值。

相比之下，如果我們使用 copy elision：

// Good version
S __declspec(noinline) f2()
{S s;return s;
}void g2()
{consume(f2());
}

那么 msvc 代碼生成是

; on entry, rcx says where to put the return value在入口處，rcx指出將返回值放在何處。
f2:push    rbxsub     rsp, 48mov     rbx, rcx; construct directly into return value (still in rcx)直接構造為返回值（仍在rcx中）call    S::S(); and return it并將其返回mov     rax, rbxadd     rsp, 48pop     rbxretg2:sub     rsp, 40; put return value of f1 directly into outbound parameter將f1的返回值直接放入出站參數中lea     rcx, qword ptr [rsp+48]call    f2(); call consume with the outbound parameter使用出站參數調用消費mov     rcx, eaxcall    consume(S); returnadd     rsp, 40ret

其他編譯器（GCC, Clang, ICC ICX）也有類似結果。在 GCC, Clang 和 ICX 中，你可以啟用 -Wpessimizing-move 警告來提示你何時犯了這些錯誤。

(文章評論區精選翻譯)

紅樓鍮鍮: 當 std::move 濫用與 auto&& 濫用結合時會更糟：auto&& name = get_name(); 會不必要地創建一個抑制 NRVO 的引用；auto&& name = std::move(get_name()); 實際上會創建一個懸垂引用，因為 C++ 不會延長臨時對象的生命周期如果在其和聲明的局部引用之間存在函數調用。有趣的是，auto&& name = static_cast<std::string &&>(get_name()); 會產生一個有效的引用！我懷疑用 static_cast 替換 std::move 可能會恢復 NRVO。
Neil Rashbrook: （回應上條）假設我做對了，如果你移動（move）或轉換（cast）了值，你就得不到優化。
Kevin Norris: （回應上條）有趣的是，似乎 MSVC 能用 return static_cast<T&&>(...) 優化掉移動，但 GCC 和 Clang 不能。雖然 GCC 和 Clang 會對 return std::move(...); 發出警告，但它們不會對 return static_cast<T&&>(...) 發出警告。
Solomon Ucko: 我從這里得到的是：std::move 是一個轉換操作（cast）。它應該以與 static_cast 完全相同的謹慎態度對待。只在你能清楚地說明這個轉換在形式上做了什么（特別是：為什么你期望編譯器在該值被轉換為右值引用時以不同方式處理它？）、為什么該用例不滿足復制省略（包括但不限于 NVRO）的條件、以及被移動源對象（moved-from）之后會怎樣（特別是：你應該合理確信被移動源對象永遠不會再被使用）時使用它。
Simon Farnsworth: 注意，與 C++17 及以后的強制 NRVO 不同，Rust 的 NRVO完全是可選的。它在某些情況下甚至是不健全的，因此在某些版本中被禁用。
Kevin Norris: 有沒有什么副作用或其他我看不到的原因，導致 return std::move(name); 的情況不能被優化掉？或者這只是標準遺漏了一個機會，而編譯器被標準所約束？
(其他回復 Kevin): 標準要求如果返回值是純右值（prvalue）則必須省略拷貝，但 std::move(foo) 是一個將亡值（xvalue）。標準允許在 Raymond 描述的 NVRO 情況以及其他一些涉及異常和協程的特殊場景中進行省略。在所有其他情況下，省略只允許在 “as-if” 規則下，這要求編譯器證明在可觀察行為上沒有差異——這通常很困難。廣義上將亡值情況下的省略可能無效，因為對象可能在函數調用前就存在，且對其他部分可見或并發訪問，跳過其析構會造成問題。標準可以為“誤用了 std::move() 的 NVRO 情況”開特例，但告訴人們不要那樣做更簡單。

核心總結：

Raymond Chen 的文章核心警告了在 C++ 中 過度和不必要地使用 std::move 反而會損害性能，特別是在涉及函數返回值和初始化新對象時，因為它會阻止編譯器進行關鍵的優化：

在函數返回值上過度使用 std::move：
- 錯誤做法： return std::move(local_variable);
- 危害： 阻止了 命名返回值優化 (NRVO)。NRVO 允許編譯器直接在函數的返回值槽中構造局部變量，從而完全省略拷貝/移動構造。
- 正確做法： 直接返回局部變量：return local_variable;。這滿足 NRVO 的條件，編譯器（在實踐中通常會）進行優化，實現零拷貝/移動。
在接收函數返回值初始化新對象時過度使用 std::move:
- 錯誤做法 (初始化變量)： T var = std::move(func());
- 錯誤做法 (傳遞參數)： some_func(std::move(func()));
- 危害： 阻止了 初始化省略 (Initialization Elision)。當使用相同類型的值初始化另一個對象（變量或參數）時，編譯器可以直接將源值用作目標對象，省略中間的臨時對象和拷貝/移動操作。
- 正確做法： 直接使用返回值初始化：T var = func(); 或 some_func(func());。這允許編譯器進行初始化省略。
雙重錯誤（最糟糕）： 如果在返回函數中錯誤使用 std::move 阻止了 NRVO，并且在調用函數中錯誤使用 std::move 阻止了初始化省略，那么本來可以完全零拷貝/移動的操作鏈（func() 內部構造 -> 直接用作返回值 -> 直接用作參數或變量），會變成：
- 在返回函數中：一次移動構造（因為 NRVO 被阻止）。
- 在調用函數中：創建一個臨時對象（存儲 func() 的返回值），然后通過移動構造初始化目標變量或參數（因為初始化省略被阻止）。
- 結果：創建了兩個額外的臨時對象，發生了兩次額外的移動操作，并進行了兩次額外的析構操作，性能顯著下降。

核心教訓與建議：

優先信任編譯器優化： 在簡單返回局部變量或直接用返回值初始化相同類型對象時，不要添加 std::move。讓編譯器利用 NRVO 和初始化省略規則進行零拷貝優化。
將 std::move 視為強制類型轉換： 像對待 static_cast 一樣謹慎使用 std::move。僅在需要顯式啟用移動語義（例如，要將對象的所有權轉移給函數，或你知道源對象不再需要且移動比拷貝更廉價）時使用。
理解使用 std::move 的后果： 使用 std::move 后，被移動的對象處于有效但未指定狀態，不應再依賴其內容。
啟用編譯器警告： 使用 GCC, Clang 或 ICC 時，開啟 -Wpessimizing-move（或等效警告）來檢測這種潛在的性能反優化。
避免 auto&& 與 std::move 的致命組合： auto&& name = std::move(func()); 容易創建懸垂引用，因為 func() 返回的臨時對象生命周期不會因 std::move 而延長。

總之： std::move 是一個有用的工具，但濫用它會適得其反，阻礙編譯器進行更高效的優化（復制省略），最終導致性能下降和潛在問題。在簡單返回局部變量和直接初始化場景中，應首選簡潔寫法，信任編譯器優化。

原文翻譯

The C++ std::move function casts its parameter to an rvalue reference, which enables its contents to be consumed by another operation. But in your excitement about this new expressive capability, take care not to overuse it.

C++ std：：move 函數將其參數強制轉換為右值引用，從而使其內容可供其他作使用。但是，在您對這種新的表達能力感到興奮時，請注意不要過度使用它。

std::string get_name(int id)
{std::string name = std::to_string(id);/* assume other calculations happen here 假設這里發生了其他計算*/return std::move(name);
}

You think you are giving the compiler some help by saying “Hey, like, I’m not using my local variable name after this point, so you can just move the string into the return value.”

您認為您通過說“嘿，比如，在這一點之后我沒有使用我的局部變量名稱 ，所以你可以將字符串移動到返回值中”來為編譯器提供一些幫助。

Unfortunately, your help is actually hurting. Adding a std::move causes the return statement to fail to satisfy the conditions for copy elision (commonly known as Named Return Value Optimization, or NRVO): The thing being returned must be the name of a local variable with the same type as the function return value.

不幸的是，你的幫助實際上是有害的。添加 std：：move 會導致 return 語句無法滿足復制省略的條件（通常稱為命名返回值優化，或 NRVO）：返回的事物必須是與函數返回值類型相同的局部變量的名稱。

The added std::move prevents NRVO, and the return value is move-constructed from the name variable.

添加的 std：：move 會阻止 NRVO，并且返回值是從 name 變量移動構造的。

std::string get_name(int id)
{std::string name = std::to_string(id);/* assume other calculations happen here 假設這里發生了其他計算*/return name;
}

This time, we return name directly, and the compiler can now elide the copy and put the name variable directly in the return value slot with no copy. (Compilers are permitted but not required to perform this optimization, but in practice, all compilers will do it if all code paths return the same local variable.)

這一次，我們直接返回 name，編譯器現在可以省略 copy 并將 name 變量直接放在沒有 copy 的返回值槽中。（允許但不要求編譯器執行此優化，但實際上，如果所有代碼路徑都返回相同的局部變量，則所有編譯器都會執行此作。

The other half of the overzealous std::move is on the receiving end.

過分熱心的 std：：move 的另一半在接收端。

extern void report_name(std::string name);void sample1()
{std::string name = std::move(get_name());
}void sample2()
{report_name(std::move(get_name()));
}

In these two sample functions, we take the return value from get_name and explicitly std::move it into a new local variable or into a function parameter. This is another case of trying to be helpful and ending up hurting.

在這兩個示例函數中，我們從 get_name 獲取返回值，并顯式地 std：：move 將其放入新的局部變量或函數參數中。這是另一種試圖提供幫助但最終受傷的情況。

Constructing a value (either a local variable or a function parameter) from a matching value of the same type will be elided: The matching value is stored directly into the local variable or parameter without a copy. But adding a std::move prevents this optimization from occurring, and the value will instead be move-constructed.

從相同類型的匹配值構造值（局部變量或函數參數）將被省略：匹配值直接存儲到局部變量或參數中，無需復制。但是添加 std：：move 會阻止這種優化的發生，并且該值將被 move 構造。

extern void report_name(std::string name);void sample1()
{std::string name = get_name();
}void sample2()
{report_name(get_name());
}

What’s particularly exciting is when you combine both mistakes. In that case, you took what would have been a sequence that had no copy or move operations at all and converted it into a sequence that creates two extra temporaries, two extra move operations, and two extra destructions.

特別令人興奮的是，當你把這兩個錯誤結合起來時。在這種情況下，您獲取了一個根本沒有復制或移動作的序列，并將其轉換為一個序列，該序列創建了兩個額外的臨時作、兩個額外的移動作和兩個額外的銷毀。

#include <memory>
struct S
{S();S(S const&);S(S &&);~S();
};extern void consume(S s); // 消耗// Bad version
S __declspec(noinline) f1()
{S s;return std::move(s);
}void g1()
{consume(std::move(f1()));
}

Here’s the compiler output for msvc:

以下是 msvc 的編譯器輸出：

; on entry, rcx says where to put the return value在入口處，rcx指出將返回值放在何處。
f1:mov     qword ptr [rsp+8], rcxpush    rbxsub     rsp, 48mov     rbx, rcx; construct local variable s on stack在堆棧上構造局部變量slea     rcx, qword ptr [rsp+64]call    S::S(); copy local variable to return value復制局部變量到返回值lea     rdx, qword ptr [rsp+64]mov     rcx, rbxcall    S::S(S &&); destruct the local variable s析構局部變量 slea     rcx, qword ptr [rsp+64]call    S::~S(); return the result返回結果mov     rax, rbxadd     rsp, 48pop     rbxretg1:sub     rsp, 40; call f1 and store into temporary variable調用f1并存儲到臨時變量中lea     rcx, qword ptr [rsp+56]call    f1(); copy temporary to outbound parameter復制臨時到出站參數mov     rdx, raxlea     rcx, qword ptr [rsp+48]call    S::S(S &&); call consume with the outbound parameter使用出站參數調用消費mov     rcx, raxcall    consume(S); clean up the temporary清理臨時的lea     rcx, qword ptr [rsp+56]call    S::~S(); returnadd     rsp, 40ret

Notice that calling g1 resulted in the creation of a total of two extra copies of S, one in f1 and another to hold the return value of f1.

請注意，調用 g1 會導致總共創建兩個額外的 S 副本，一個在 f1 中，另一個用于保存 f1 的返回值。

By comparison, if we use copy elision:

相比之下，如果我們使用 copy elision：

// Good version
S __declspec(noinline) f2()
{S s;return s;
}void g2()
{consume(f2());
}

then the msvc code generation is

那么 msvc 代碼生成是

; on entry, rcx says where to put the return value在入口處，rcx指出將返回值放在何處。
f2:push    rbxsub     rsp, 48mov     rbx, rcx; construct directly into return value (still in rcx)直接構造為返回值（仍在rcx中）call    S::S(); and return it并將其返回mov     rax, rbxadd     rsp, 48pop     rbxretg2:sub     rsp, 40; put return value of f1 directly into outbound parameter將f1的返回值直接放入出站參數中lea     rcx, qword ptr [rsp+48]call    f2(); call consume with the outbound parameter使用出站參數調用消費mov     rcx, eaxcall    consume(S); returnadd     rsp, 40ret