循環直到 python_如果您在Python中存在慢循環，則可以對其進行修復……直到無法解決為止...

循環直到 python

by Maxim Mamaev

馬克西姆·馬馬耶夫(Maxim Mamaev)

Let’s take a computational problem as an example, write some code, and see how we can improve the running time. Here we go.

讓我們以一個計算問題為例，編寫一些代碼，看看如何改善運行時間。開始了。

場景設置：背包問題 (Setting the scene: the knapsack problem)

This is the computational problem we’ll use as the example:

這是計算問題，我們將使用它作為示例：

The knapsack problem is a well-known problem in combinatorial optimization. In this section, we will review its most common flavor, the 0–1 knapsack problem, and its solution by means of dynamic programming. If you are familiar with the subject, you can skip this part.

背包問題是組合優化中的一個眾所周知的問題。在本節中，我們將回顧其最常見的特征， 0-1背包問題及其通過動態編程的解決方案。如果您熟悉此主題，則可以跳過此部分。

You are given a knapsack of capacity C and a collection of N items. Each item has weight w[i] and value v[i]. Your task is to pack the knapsack with the most valuable items. In other words, you are to maximize the total value of items that you put into the knapsack subject, with a constraint: the total weight of the taken items cannot exceed the capacity of the knapsack.

給您一個容量為C的背包和N個物品的集合。每個項目的權重w [i]和值v [i] 。您的任務是用最有價值的物品包裝背包。換句話說，您要最大程度地增加放入背包主題的物品的總價值，并要施加以下限制：所取物品的總重量不能超過背包的容量。

Once you’ve got a solution, the total weight of the items in the knapsack is called “solution weight,” and their total value is the “solution value”.

找到解決方案后，背包中物品的總重量稱為“解決方案重量”，其總價值為“解決方案價值”。

The problem has many practical applications. For example, you’ve decided to invest $1600 into the famed FAANG stock (the collective name for the shares of Facebook, Amazon, Apple, Netflix, and Google aka Alphabet). Each share has a current market price and the one-year price estimate. As of one day in 2018, they are as follows:

該問題有許多實際應用。例如，您已決定向著名的FAANG股票(Facebook，Amazon，Apple，Netflix和Google aka Alphabet的股份統稱)投資1600美元。每股都有當前的市場價格和一年的價格估算。截至2018年的一天，它們如下所示：

========= ======= ======= =========Company   Ticker  Price   Estimate========= ======= ======= =========Alphabet  GOOG    1030    1330Amazon    AMZN    1573    1675Apple     AAPL    162     193 Facebook  FB      174     216 Netflix   NFLX    312     327========= ======= ======= =========

For the simplicity of the example, we’ll assume that you’d never put all your eggs in one basket. You are willing to buy no more than one share of each stock. What shares do you buy to maximize your profit?

為了簡化示例，我們假設您不會將所有雞蛋都放在一個籃子里。您愿意購買不超過一股的股票。您購買什么股票以最大化利潤？

This is a knapsack problem. Your budget ($1600) is the sack’s capacity (C). The shares are the items to be packed. The current prices are the weights (w). The price estimates are the values. The problem looks trivial. However, the solution is not evident at the first glance — whether you should buy one share of Amazon, or one share of Google plus one each of some combination of Apple, Facebook, or Netflix.

這是一個背包問題。您的預算($ 1600)是麻袋的容量(C) 。股份是要包裝的物品。當前價格是重量(w) 。價格估計是價值。這個問題看起來微不足道。但是，乍看之下，該解決方案并不明顯-您應該購買一股亞馬遜，還是一股Google，再加上蘋果，Facebook或Netflix的某種組合。

Of course, in this case, you may do quick calculations by hand and arrive at the solution: you should buy Google, Netflix, and Facebook. This way you spend $1516 and expect to gain $1873.

當然，在這種情況下，您可以手動進行快速計算并得出解決方案：您應該購買Google，Netflix和Facebook。這樣，您花費$ 1516并期望獲得$ 1873。

Now you believe that you’ve discovered a Klondike. You shatter your piggy bank and collect $10,000. Despite your excitement, you stay adamant with the rule “one stock — one buy”. Therefore, with that larger budget, you have to broaden your options. You decide to consider all stocks from the NASDAQ 100 list as candidates for buying.

現在您相信自己已經發現了克朗代克。您擊碎了存錢罐并收集了$ 10,000。盡管很興奮，但您仍然堅持“買一送一”的規則。因此，使用較大的預算，您必須擴大選擇范圍。您決定考慮將納斯達克100名單中的所有股票視為購買候選者。

The future has never been brighter, but suddenly you realize that, in order to identify your ideal investment portfolio, you will have to check around 21?? combinations. Even if you are super optimistic about the imminence and the ubiquity of the digital economy, any economy requires — at the least — a universe where it runs. Unfortunately, in a few trillion years when your computation ends, our universe won’t probably exist.

未來從未有過光明，但您突然意識到，要確定理想的投資組合，您將不得不檢查大約21??的組合。即使您對數字經濟的迫在眉睫和無所不在感到非常樂觀，但任何一種經濟至少都需要一個運行它的宇宙。不幸的是，在您的計算結束的數萬億年中，我們的宇宙可能不存在。

動態規劃算法 (Dynamic programming algorithm)

We have to drop the brute force approach and program some clever solution. Small knapsack problems (and ours is a small one, believe it or not) are solved by dynamic programming. The basic idea is to start from a trivial problem whose solution we know and then add complexity step-by-step.

我們必須放棄暴力手段，并制定一些明智的解決方案。小背包問題(無論信不信由你，我們的問題都很小)可以通過動態編程解決。基本思想是從一個我們知道的簡單問題開始，然后逐步增加復雜性。

If you find the following explanations too abstract, here is an annotated illustration of the solution to a very small knapsack problem. This will help you visualize what is happening.

如果您發現以下說明過于抽象，則這里是帶注釋的圖示，說明了非常小的背包問題的解決方案。這將幫助您可視化正在發生的事情。

Assume that, given the first i items of the collection, we know the solution values s(i, k) for all knapsack capacities k in the range from 0 to C.

假設給定集合的前i個項，我們知道所有背包容量k在0到C之間的解值s(i，k) 。

In other words, we sewed C+1 “auxiliary” knapsacks of all sizes from 0 to C. Then we sorted our collection, took the first i item and temporarily put aside all the rest. And now we assume that, by some magic, we know how to optimally pack each of the sacks from this working set of i items. The items that we pick from the working set may be different for different sacks, but at the moment we are not interested what items we take or skip. It is only the solution value s(i, k) that we record for each of our newly sewn sacks.

換句話說，我們縫制了大小從0到C的 C + 1個 “輔助”背包。然后，我們對集合進行了排序，拿走了第一個i項目，并暫時將其余所有項目擱置一旁。現在，我們假設，通過某種魔術，我們知道如何從該i個工作項集中最佳地包裝每個麻袋。對于不同的麻袋，我們從工作集中挑選的物品可能會有所不同，但目前我們對我們拿走或跳過哪些物品不感興趣。我們僅為每個新縫制的麻袋記錄的解值s(i，k) 。

Now we fetch the next, (i+1)th, item from the collection and add it to the working set. Let’s find solution values for all auxiliary knapsacks with this new working set. In other words, we find s(i+1, k) for all k=0..C given s(i, k).

現在我們從集合中獲取下一個(i + 1)項并將其添加到工作集中。讓我們用這個新工作集找到所有輔助背包的解決方案值。換句話說，對于給定s(i，k)的所有k = 0..C ，我們找到s(i + 1 ，k) 。

If k is less than the weight of the new item w[i+1], we cannot take this item. Indeed, even if we took only this item, it alone would not fit into the knapsack. Therefore, s(i+1, k) = s(i, k) for all k < w[i+1].

如果k小于新項目w [i + 1]的權重，我們將不接受該項目。確實，即使我們只采取了這個項目，僅靠它就不能放入背包。因此，對于所有k <w [i +1]， s(i + 1，k)= s(i，k) 。

For the values k >= w[i+1] we have to make a choice: either we take the new item into the knapsack of capacity k or we skip it. We need to evaluate these two options to determine which one gives us more value packed into the sack.

為值K> = W [I 1]，我們必須做出選擇：要么我們采取新項目插入卡帕奇噸 ?k的背包或我們跳過它。我們需要評估這兩個選項，以確定哪個選項可以給我們帶來更多的價值。

If we take the (i+1)th item, we acquire the value v[i+1] and consume the part of the knapsack’s capacity to accommodate the weight w[i+1]. That leaves us with the capacity k–w[i+1] which we have to optimally fill using (some of) the first i items. This optimal filling has the solution value s(i, k–w[i+1]). This number is already known to us because, by assumption, we know all solution values for the working set of i items. Hence, the candidate solution value for the knapsack k with the item i+1 taken would be s(i+1, k | i+1 taken) = v[i+1] + s(i, k–w[i+1]).

如果我們采用第(i + 1)項，則獲取值v [i + 1]并消耗背包容量的一部分來容納權重w [i + 1] 。剩下的容量為k–w [i + 1] 我們必須使用前i個(某些)來最佳填充。該最優填充具有解值s(i，k–w [i + 1]) 。我們已經知道這個數字，因為根據假設，我們知道i個項目的工作集的所有解決方案值。因此，用于與項目i + 1取將是S中的背包k中的候選解決方案值(i + 1，K | I截取+ 1)= V [I + 1] + S(I，K-W [I + 1]) 。

The other option is to skip the item i+1. In this case, nothing changes in our knapsack, and the candidate solution value would be the same as s(i, k).

另一種選擇是跳過項i + 1 。在這種情況下，背包中沒有任何變化，候選解值將與s(i，k)相同 。

To decide on the best choice we compare the two candidates for the solution values:s(i+1, k | i+1 taken) = v[i+1] + s(i, k–w[i+1])s(i+1, k | i+1 skipped) = s(i, k)

為了確定最佳選擇，我們比較了兩個候選值的解： s(i + 1，k | i + 1取)= v [i + 1] + s(i，k–w [i + 1]) s(i + 1，k | i + 1已跳過)= s(i，k)

The maximum of these becomes the solution s(i+1, k).

這些的最大值變為解s(i + 1，k) 。

In summary:

綜上所述：

if k < w[i+1]:    s(i+1, k) = s(i, k)else:    s(i+1, k) = max( v[i+1] + s(i, k-w[i+1]), s(i, k) )

Now we can solve the knapsack problem step-by-step. We start with the empty working set (i=0). Obviously, s(0, k) = 0 for any k. Then we take steps by adding items to the working set and finding solution values s(i, k) until we arrive at s(i+1=N, k=C) which is the solution value of the original problem.

現在我們可以逐步解決背包問題。我們從空工作集( i = 0 )開始 。顯然，對于任何k ， s(0，k)= 0 。然后，我們采取步驟，將項目添加到工作集中并找到解值s(i，k)，直到得出原始問題的解值s(i + 1 = N，k = C) 。

Note that, by the way of doing this, we have built the grid of NxC solution values.

請注意，通過這樣做，我們已經建立了NxC的網格解決方案值。

Yet, despite having learned the solution value, we do not know exactly what items have been taken into the knapsack. To find this out, we backtrack the grid. Starting from s(i=N, k=C), we compare s(i, k) with s(i–1, k).

然而，盡管已經了解了解決方案的價值，但我們不確切知道背包中已經帶走了哪些物品。為了找出答案，我們回溯了網格。從s(i = N，k = C)開始 ，我們將s(i，k)與s(i-1，k)進行比較 。

If s(i, k) = s(i–1, k), the ith item has not been taken. We reiterate with i=i–1 keeping the value of k unchanged. Otherwise, the ith item has been taken and for the next examination step we shrink the knapsack by w[i] — we’ve set i=i–1, k=k–w[i].

如果s(i，k)= s(i–1，k) ，則第i個項目未被采用。我們以i = i–1重申k的值不變。否則，將采用第i個項目，在接下來的檢查步驟中，我們將背包縮小w [i] -設置i = i–1，k = k–w [i] 。

This way we examine all items from the Nth to the first, and determine which of them have been put into the knapsack. This gives us the solution to the knapsack problem.

這樣，我們檢查了從第N個到第一個的所有物品，并確定其中哪些物品已放入背包中。這為我們提供了背包問題的解決方案。

代碼與分析 (Code and analysis)

Now, as we have the algorithm, we will compare several implementations, starting from a straightforward one. The code is available on GitHub.

現在，有了算法，我們將從一個簡單的方法開始比較幾種方法。該代碼可在GitHub上獲得。

The data is the Nasdaq 100 list, containing current prices and price estimates for one hundred stock equities (as of one day in 2018). Our investment budget is $10,000.

該數據是納斯達克100清單，包含當前價格和一百種股票的價格估計(截至2018年的一天)。我們的投資預算為10,000美元。

Recall that share prices are not round dollar numbers, but come with cents. Therefore, to get the accurate solution, we have to count everything in cents — we definitely want to avoid float numbers. Hence the capacity of our knapsack is ($)10000 x 100 cents = ($)1000000, and the total size of our problem N x C = 1 000 000.

回想一下，股價不是整數，而是美分。因此，要獲得準確的解決方案，我們必須以美分來計數所有內容-我們絕對希望避免使用浮點數。因此，我們背包的容量為($)10000 x 100美分=($)1000000，而我們問題的總大小N x C = 1 000 000。

With an integer taking 4 bytes of memory, we expect that the algorithm will consume roughly 400 MB of RAM. So, the memory is not going to be a limitation. It is the execution time we should care about.

對于一個占用4個字節內存的整數，我們預計該算法將消耗大約400 MB的RAM。因此，存儲器將不會受到限制。這是我們應該關心的執行時間。

Of course, all our implementations will yield the same solution. For your reference, the investment (the solution weight) is 999930 ($9999.30) and the expected return (the solution value) is 1219475 ($12194.75). The list of stocks to buy is rather long (80 of 100 items). You can obtain it by running the code.

當然，我們所有的實現都將產生相同的解決方案。供您參考，投資(解決方案權重)為999930($ 9999.30)，預期收益(解決方案價值)為1219475($ 12194.75)。購買的股票清單相當長(100件商品中的80件)。您可以通過運行代碼獲得它。

And, please, remember that this is a programming exercise, not investment advice. By the time you read this article, the prices and the estimates will have changed from what is used here as an example.

并且，請記住， 這是編程練習，而不是投資建議 。到您閱讀本文時，價格和估計值已經與此處的示例有所不同。

普通的舊“ for”循環 (Plain old “for” loops)

The straightforward implementation of the algorithm is given below.

下面給出了該算法的直接實現。

There are two parts.

有兩個部分。

In the first part (lines 3–7 above), two nested for loops are used to build the solution grid.

在第一部分(上面的3-7行)中，使用兩個嵌套的for循環來構建解決方案網格。

The outer loop adds items to the working set until we reach N (the value of N is passed in the parameter items). The row of solution values for each new working set is initialized with the values computed for the previous working set.

外循環將項目添加到工作集中，直到達到N ( N的值在參數items傳遞)。每個新工作集的解決方案值行都使用為先前工作集計算的值進行初始化。

The inner loop for each working set iterates the values of k from the weight of the newly added item to C (the value of C is passed in the parameter capacity).

每個工作集的內部循環從新添加的item的權重中迭代k的值到C(C的值在參數capacity傳遞)。

Note that we do not need to start the loop from k=0. When k is less than the weight of item, the solution values are always the same as those computed for the previous working set, and these numbers have been already copied to the current row by initialisation.

注意，我們不需要從k = 0開始循環。當k 小于item的權重，解決方案值始終與為先前工作集計算的解決方案值相同，并且這些數字已通過初始化復制到當前行。

When the loops are completed, we have the solution grid and the solution value.

循環完成后，我們將獲得解決方案網格和解決方案值。

The second part (lines 9–17) is a single for loop of N iterations. It backtracks the grid to find what items have been taken into the knapsack.

第二部分(第9-17行)是N的單個for循環迭代。它回溯網格以查找已放入背包的物品。

Further on, we will focus exclusively on the first part of the algorithm as it has O(N*C) time and space complexity. The backtracking part requires just O(N) time and does not spend any additional memory — its resource consumption is relatively negligible.

進一步地，我們將僅專注于算法的第一部分，因為它具有O(N * C)的時間和空間復雜度。回溯部分只需要O(N)時間，而不會花費任何額外的內存-其資源消耗相對可以忽略不計。

It takes 180 seconds for the straightforward implementation to solve the Nasdaq 100 knapsack problem on my computer.

直接實現需要180秒才能解決計算機上的Nasdaq 100背包問題。

How bad is it? On the one hand, with the speeds of the modern age, we are not used to spending three minutes waiting for a computer to do stuff. On the other hand, the size of the problem — a hundred million — looks indeed intimidating, so, maybe, three minutes are ok?

有多糟一方面，隨著現代時代的飛速發展，我們不習慣于花三分鐘時間等待計算機來做事情。另一方面，問題的規模(一億)看起來確實令人生畏，所以也許三分鐘就可以了嗎？

To obtain some benchmark, let’s program the same algorithm in another language. We need a statically-typed compiled language to ensure the speed of computation. No, not C. It is not fancy. We’ll stick to fashion and write in Go:

為了獲得一些基準，讓我們用另一種語言編寫相同的算法。我們需要一種靜態類型的編譯語言來確保計算速度。不，不是C。這不是幻想。我們將堅持時尚并用Go語言編寫：

As you can see, the Go code is quite similar to that in Python. I even copy-pasted one line, the longest, as is.

如您所見，Go代碼與Python中的代碼非常相似。我什至可以復制粘貼最長的一行。

What is the running time? 400 milliseconds! In other words, Python came out 500 times slower than Go. The gap will probably be even bigger if we tried it in C. This is definitely a disaster for Python.

幾點鐘了？ 400毫秒 ！換句話說，Python的發布速度比Go慢500倍。如果我們在C中進行嘗試，差距可能會更大。對于Python來說，這絕對是一場災難。

To find out what slows down the Python code, let’s run it with line profiler. You can find profiler’s output for this and subsequent implementations of the algorithm at GitHub.

要找出導致Python代碼變慢的原因，讓我們使用line profiler運行它。您可以在GitHub上找到此算法及其后續實現的探查器輸出。

In the straightforward solver, 99.7% of the running time is spent in two lines. These two lines comprise the inner loop, that is executed 98 million times:

在簡單的求解器中，99.7％的運行時間花費在兩行中。這兩行組成內部循環，執行了9800萬次：

I apologize for the excessively long lines, but the line profiler cannot properly handle line breaks within the same statement.

對于過長的行，我深表歉意，但是行探查器無法正確處理同一條語句中的換行符。

I’ve heard that Python’s for operator is slow but, interestingly, the most time is spent not in the for line but in the loop’s body.

我聽說Python的for運算符很慢，但是有趣的是，大多數時間不是花在for行上，而是花在循環體內。

We can break down the loop’s body into individual operations to see if any particular operation is too slow:

我們可以將循環的主體分解為單獨的操作，以查看任何特定的操作是否太慢：

It appears that no particular operation stands out. The running times of individual operations within the inner loop are pretty much the same as the running times of analogous operations elsewhere in the code.

似乎沒有什么特別的操作引人注目。內部循環中單個操作的運行時間與代碼中其他位置的類似操作的運行時間幾乎相同。

Note how breaking the code down increased the total running time. The inner loop now takes 99.9% of the running time. The dumber your Python code, the slower it gets. Interesting, isn’t it?

注意分解代碼如何增加總運行時間。現在，內部循環占用了99.9％的運行時間。使您的Python代碼變得笨拙，速度變慢。有趣，不是嗎？

內置地圖功能 (Built-in map function)

Let’s make the code more optimised and replace the inner for loop with a built-in map() function:

讓我們使代碼更優化，并用內置的map()函數替換內部的for循環：

The execution time of this code is 102 seconds, being 78 seconds off the straightforward implementation’s score. Indeed, map() runs noticeably, but not overwhelmingly, faster.

該代碼的執行時間為102秒 ，比簡單實現的得分低78秒。確實， map()運行速度明顯加快了，但并非絕對如此。

清單理解 (List comprehension)

You may have noticed that each run of the inner loop produces a list (which is added to the solution grid as a new row). The Pythonic way of creating lists is, of course, list comprehension. Let’s try it instead of map().

您可能已經注意到，內部循環的每次運行都會產生一個列表(將其作為新行添加到解決方案網格中)。創建列表的Python方式當然是列表理解。讓我們嘗試一下，而不是map() 。

This finished in 81 seconds. We’ve achieved another improvement and cut the running time by half in comparison to the straightforward implementation (180 sec). Out of the context, this would be praised as significant progress. Alas, we are still light years away from our benchmark 0.4 sec.

這完成了81秒 。與簡單的實現(180秒)相比，我們已經實現了另一項改進，并將運行時間縮短了一半。從上下文來看，這將被視為重大進展。 las，我們離基準測試0.4秒還有很短的路程。

NumPy數組 (NumPy arrays)

At last, we have exhausted built-in Python tools. Yes, I can hear the roar of the audience chanting “NumPy! NumPy!” But to appreciate NumPy’s efficiency, we should have put it into context by trying for, map() and list comprehension beforehand.

最后，我們用盡了內置的Python工具。是的，我可以聽到觀眾高呼“ NumPy！ NumPy！” 但是要欣賞NumPy的效率，我們應該事先嘗試for ， map()和列表理解將其置于上下文中。

Ok, now it is NumPy time. So, we abandon lists and put our data into numpy arrays:

好的，現在是NumPy時間。因此，我們放棄列表并將數據放入numpy數組中：

Suddenly, the result is discouraging. This code runs 1.5 times slower than the vanilla list comprehension solver (123 sec versus 81 sec). How can that be?

突然，結果令人沮喪。該代碼的運行速度比普通列表理解求解器慢了1.5倍( 123秒對81秒)。怎么可能？

Let’s examine the line profiles for both solvers.

讓我們檢查兩個求解器的線輪廓。

Initialization of grid[0] as a numpy array (line 274) is three times faster than when it is a Python list (line 245). Inside the outer loop, initialization of grid[item+1] is 4.5 times faster for a NumPy array (line 276) than for a list (line 248). So far, so good.

將grid[0]初始化為numpy數組(第274行)比將其作為Python列表(第245行)快三倍。在外循環內，對于NumPy數組(第276行)， grid[item+1]初始化速度比列表(第248行)的4.5倍快。到目前為止，一切都很好。

However, the execution of line 279 is 1.5 times slower than its numpy-less analog in line 252. The problem is that list comprehension creates a list of values, but we store these values in a NumPy array which is found on the left side of the expression. Hence, this line implicitly adds an overhead of converting a list into a NumPy array. With line 279 accounting for 99.9% of the running time, all the previously noted advantages of numpy become negligible.

但是，第279行的執行速度比第252行中的無numpy模擬執行速度慢1.5倍。問題是列表理解會創建一個值列表，但我們將這些值存儲在NumPy數組中 ，該數組位于該函數的左側表達方式。因此，此行隱式增加了將列表轉換為NumPy數組的開銷。由于279行占運行時間的99.9％，因此numpy之前提到的所有優點都可以忽略不計。

But we still need a means to iterate through arrays in order to do the calculations. We have already learned that list comprehension is the fastest iteration tool. (By the way, if you try to build NumPy arrays within a plain old for loop avoiding list-to-NumPy-array conversion, you’ll get the whopping 295 sec running time.) So, are we stuck and is NumPy of no use? Of course, not.

但是我們仍然需要一種遍歷數組的方法來進行計算。我們已經了解到列表理解是最快的迭代工具。 (順便說一句，如果您嘗試在一個普通的for循環中構建NumPy數組， for避免從列表到NumPy數組的轉換，那么您將獲得295秒的運行時間。)因此，我們被困住了，NumPy沒有用？當然不是。

正確使用NumPy (Proper use of NumPy)

Just storing data in NumPy arrays does not do the trick. The real power of NumPy comes with the functions that run calculations over NumPy arrays. They take arrays as parameters and return arrays as results.

僅將數據存儲在NumPy數組中并不能解決問題。 NumPy的真正功能在于對NumPy數組進行計算的函數。他們將數組作為參數，并將數組作為結果。

For example, there is function where() which takes three arrays as parameters: condition, x, and y, and returns an array built by picking elements either from x or from y. The first parameter, condition, is an array of booleans. It tells where to pick from: if an element of condition is evaluated to True, the corresponding element of x is sent to the output, otherwise the element from y is taken.

例如，有一個where()函數where()它將三個數組作為參數： condition ， x和y ，并返回一個通過從x或y選擇元素構建的數組。第一個參數condition是布爾數組。它告訴從哪里選擇：如果condition的元素被評估為True ，則x的對應元素被發送到輸出，否則取y的元素。

Note that the NumPy function does all this in a single call. Looping through the arrays is put away under the hood.

請注意，NumPy函數在單個調用中完成了所有這些操作。遍歷陣列的循環被隱藏在引擎蓋下。

This is how we use where() as a substitute of the internal for loop in the first solver or, respectively, the list comprehension of the latest:

這是我們在第一個求解器或最新的列表理解中使用where()代替內部for循環的方式：

There are three pieces of code that are interesting: line 8, line 9 and lines 10–13 as numbered above. Together, they substitute for the inner loop which would iterate through all possible sizes of knapsacks to find the solution values.

有三段有趣的代碼：第8行，第9行和第10-13行，如上所示。它們一起代替了內部循環，該內部循環將遍歷背包的所有可能尺寸以找到解值。

Until the knapsack’s capacity reaches the weight of the item newly added to the working set (this_weight), we have to ignore this item and set solution values to those of the previous working set. This is pretty straightforward (line 8):

在背包的容量達到新添加到工作集中的項目的重量( this_weight )之前，我們必須忽略此項目并將解決方案值設置為先前工作集的值。這非常簡單(第8行)：

grid[item+1, :this_weight] = grid[item, :this_weight]

Then we build an auxiliary array temp (line 9):

然后我們建立一個輔助數組temp (第9行)：

temp = grid[item, :-this_weight] + this_value

This code is analogous to, but much faster than:

該代碼類似于但比以下代碼快得多：

[grid[item, k — this_weight] + this_value  for k in range(this_weight, capacity+1)]

It calculates would-be solution values if the new item were taken into each of the knapsacks that can accommodate this item.

如果將新物品放入可容納該物品的每個背包中，它將計算可能的解決方案值。

Note how thetemp array is built by adding a scalar to an array. This is another powerful feature of NumPy called “broadcasting”. When NumPy sees operands with different dimensions, it tries to expand (that is, to “broadcast”) the low-dimensional operand to match the dimensions of the other. In our case, the scalar is expanded to an array of the same size as grid[item, :-this_weight] and these two arrays are added together. As a result, the value of this_value is added to each element of grid[item, :-this_weight]— no loop is needed.

請注意如何通過向數組添加標量來構建temp數組。這是NumPy的另一個強大功能，稱為“廣播”。當NumPy看到尺寸不同的操作數時，它將嘗試擴展(即“廣播”)低維操作數以匹配另一個尺寸。在我們的例子中，標量被擴展為與grid[item, :-this_weight]大小相同的數組，并將這兩個數組加在一起。結果， this_value的值被添加到grid[item, :-this_weight]每個元素中-不需要循環。

In the next piece (lines 10–13) we use the function where() which does exactly what is required by the algorithm: it compares two would-be solution values for each size of knapsack and selects the one which is larger.

在下一部分(第10-13行)中，我們使用了where()函數where()該函數完全滿足算法的要求：針對每個背包尺寸，它比較兩個可能的解決方案值，然后選擇一個較大的值。

grid[item + 1, this_weight:] =                 np.where(temp > grid[item, this_weight:],             temp,             grid[item, this_weight:])

The comparison is done by the condition parameter, which is calculated as temp > grid[item, this_weight:]. This is an element-wise operation that produces an array of boolean values, one for each size of an auxiliary knapsack. A True value means that the corresponding item is to be packed into the knapsack. Therefore, the solution value taken from the array is the second argument of the function, temp. Otherwise, the item is to be skipped, and the solution value is copied from the previous row of the grid — the third argument of the where()function .

比較是通過condition參數完成的， condition參數的計算方式為temp > grid[item, this_weigh t：]。這是一個逐個元素的操作，它生成一個布爾值數組，每個輔助背包的大小都對應一個布爾值。 AT ?UE值的裝置，所述對應的產品被包裝到所述背包。因此，從數組中獲得的解值是函數n, t emp的第二個參數。否則，該項目將被跳過，并且所述溶液的值從電網中的前一行復制- T的第三個參數he wher E()函數。

At last, the warp drive engaged! This solver executes in 0.55 sec. This is 145 times faster than the list comprehension-based solver and 329 times faster than the code using thefor loop. Although we did not outrun the solver written in Go (0.4 sec), we came quite close to it.

最后，翹曲驅動器啟動了！該求解器在0.55秒內執行。這比基于列表理解的求解器快145倍，比使用for循環的代碼快329倍。盡管我們沒有超出用Go編寫的求解器(0.4秒)的速度，但我們離它很近。

有些循環會留下來 (Some loops are to stay)

Wait, but what about the outer for loop?

等待，但是外部for循環呢？

In our example, the outer loop code, which is not part of the inner loop, is run only 100 times, so we can get away without tinkering with it. However, other times the outer loop can turn out to be as long as the inner.

在我們的示例中，外部循環代碼(不是內部循環的一部分)僅運行100次，因此我們無需修改就可以逃脫。但是，有時其他情況下，外部循環可能與內部循環一樣長。

Can we rewrite the outer loop using a NumPy function in a similar manner to what we did to the inner loop? The answer is no.

我們是否可以使用NumPy函數以類似于內部循環的方式重寫外部循環？答案是不。

Despite both being for loops, the outer and inner loops are quite different in what they do.

盡管這兩個都是for循環，但外部和內部循環的工作方式卻大不相同。

The inner loop produces a 1D-array based on another 1D-array whose elements are all known when the loop starts. It is this prior availability of the input data that allowed us to substitute the inner loop with either map(), list comprehension, or a NumPy function.

內部循環根據另一個1D數組生成一個1D數組，當循環開始時，其所有元素都是已知的。正是輸入數據的這種先驗可用性使我們可以使用map() ，列表理解或NumPy函數替換內部循環。

The outer loop produces a 2D-array from 1D-arrays whose elements are not known when the loop starts. Moreover, these component arrays are computed by a recursive algorithm: we can find the elements of the (i+1)th array only after we have found the ith.

外環產生從一維陣列，其元素在循環開始時，不知道的2D陣列。此外，這些組件數組是通過遞歸算法計算的：只有找到第i個元素，我們才能找到第(i + 1)個數組的元素。

Suppose the outer loop could be presented as a function:grid = g(row0, row1, … rowN) All function parameters must be evaluated before the function is called, yet only row0 is known beforehand. Since the computation of the (i+1)th row depends on the availability of the ith, we need a loop going from 1 to N to compute all the row parameters. Therefore, to substitute the outer loop with a function, we need another loop which evaluates the parameters of this function. This other loop is exactly the loop we are trying to replace.

假設外部循環可以表示為一個函數： grid = g(row0, row1, … rowN)必須在調用函數之前對所有函數參數求值，但只有row0事先已知。由于計算第(i + 1)行依賴于第i個的可用性，我們需要一個循環從去1至N計算所有的row 參數。因此，要用一個函數代替外部循環，我們需要另一個循環來評估該函數的參數。另一個循環正是我們要替換的循環。

The other way to avoid the outer for loop is to use the recursion. One can easily write the recursive function calculate(i) that produces the ith row of the grid. In order to do the job, the function needs to know the (i-1)th row, thus it calls itself as calculate(i-1) and then computes the ith row using the NumPy functions as we did before. The entire outer loop can then be replaced with calculate(N). To make the picture complete, a recursive knapsack solver can be found in the source code accompanying this article on GitHub.

避免外部for循環的另一種方法是使用遞歸。可以很容易地編寫生成網格第i行的遞歸函數calculate(i) 。為了完成這項工作，該函數需要知道第(i-1)行，因此將其自身稱為calculate(i-1) ，然后像以前一樣使用NumPy函數計算第i行。然后可以將整個外部循環替換為calculate(N) 。為了使圖片更完整，可以在GitHub上隨本文附帶的源代碼中找到一個遞歸背包求解器。

However, the recursive approach is clearly not scalable. Python is not tail-optimized. The depth of the recursion stack is, by default, limited by the order of one thousand. This limit is surely conservative but, when we require a depth of millions, stack overflow is highly likely. Moreover, the experiment shows that recursion does not even provide a performance advantage over a NumPy-based solver with the outer for loop.

但是，遞歸方法顯然不可擴展。 Python并非尾部優化。默認情況下，遞歸堆棧的深度受一千個量級的限制。這個限制肯定是保守的，但是當我們需要數百萬的深度時，很有可能發生堆棧溢出。此外，實驗表明，與帶有外部for循環的基于NumPy的求解器相比，遞歸甚至無法提供性能優勢。

This is where we run out of the tools provided by Python and its libraries (to the best of my knowledge). If you absolutely need to speed up the loop that implements a recursive algorithm, you will have to resort to Cython, or to a JIT-compiled version of Python, or to another language.

據我們所知，這是我們用完Python及其庫提供的工具的地方。如果您絕對需要加快實現遞歸算法的循環，則必須使用Cython，JIT編譯的Python版本或另一種語言。

外賣 (Takeaways)

Do numerical calculations with NumPy functions. They are two orders of magnitude faster than Python’s built-in tools.
使用NumPy函數進行數值計算。它們比Python的內置工具快兩個數量級。
Of Python’s built-in tools, list comprehension is faster than map() , which is significantly faster than for.
在Python的內置工具中，列表理解比map()快，而map()則比for快得多。
For deeply recursive algorithms, loops are more efficient than recursive function calls.
對于深度遞歸算法，循環比遞歸函數調用更有效。
You cannot replace recursive loops with map(), list comprehension, or a NumPy function.
您不能用map() ，列表理解或NumPy函數替換遞歸循環。
“Dumb” code (broken down into elementary operations) is the slowest. Use built-in functions and tools.
“啞”代碼(分解為基本操作)是最慢的。使用內置的功能和工具。