二進制搜索算法

by Divya Godayal

通過Divya Godayal

二進制搜索的一個扭曲故事 (A twisted tale of Binary Search)

Awesome. That’s how I feel right now. Writing my first solo tech article.

太棒了那就是我現在的感覺。寫我的第一篇個人技術文章。

I must say I have a lot to share with you guys, and have a lot more to learn as well. So without any further ado, lets get to it. And yes, hold on tight — ‘cause there is a twist in the tale. ?

我必須說，我有很多可以與你們分享的東西，并且還有很多東西要學習。因此，事不宜遲，讓我們開始吧。是的，請緊緊握住-因為故事有個轉折。？

二元搜尋 (Binary Search)

All of us have heard of the classic 2 Eggs and 100 Stories problem. I have something similar for you.

我們所有人都聽說過經典的2個雞蛋和100個故事問題。我有類似的東西給你。

You have a 100 story building with a rule:

您有一個具有規則的100層建筑：

People with pets can only occupy top floors

Your friend wishes to buy an apartment in this building. She is too scared of pets to live near them, but you love them. She asked if you can help her find out where exactly the pet friendly floors start. She wants to explore all of the different options available, and so you need to find out which floors starting from the ground up are the ones that don’t allow pets.

您的朋友希望在此建筑物中購買公寓。她太怕寵物了，不能住在附近，但你愛它們。她問您是否可以幫助她找出寵物友好地板的確切位置。她想探索所有可用的不同選項，因此您需要找出從頭開始的哪些樓層不允許寵物入內。

The building management folks are on a holiday. On every floor, there is a sign board next to the elevator telling you if the floor is pet friendly or not. But you are too lazy to stop at every floor to check the pet sign board since the lift is so slow.

建筑管理人員正在度假。在每層樓上，電梯旁邊都有一個告示牌，告訴您該樓層是否可養寵物。但是，由于電梯太慢，您懶得在每個樓層停下來檢查寵物指示牌。

What do you do?

你是做什么？

The lift takes almost a minute at every floor to stop and then start again. Yes that’s how bad it is. But between the floors, navigation is pretty smooth. You have to get this done quickly.

電梯在每層都需要花費近一分鐘的時間才能停止，然后重新開始。是的，那真是太糟糕了。但是在樓層之間，導航非常流暢。您必須快速完成此任務。

How do you go about it ?

你如何去做？

迭代法 (Iterative approach)

One na?ve approach to this would be to start at the very bottom of the building (the ground floor) and keep stopping the lift at every single floor to check the sign that floor has posted. You stop when you find the pet friendly sign.

一種簡單的方法是從建筑物的最底端(底層)開始，并在每一層都停下電梯，以檢查該層已張貼的跡象。找到寵物友好標志時，您將停下來。

Best case is that the ground floor has the pet sign. Meaning the entire building has pets. No way your friend would buy an apartment here.

最好的情況是一樓有寵物標志。意思是整個建筑都養寵物。您的朋友絕對不會在這里購買公寓。

Average case is that you go to the 50th floor, stopping at every floor in between, and finally find a pet sign board. So your friend can buy one from 1–49.

一般情況是，您去了50樓，停在中間的每一層，最后找到一個寵物告示板。這樣您的朋友可以從1到49購買一個。

Worst case scenario would be you reaching the 100th floor, stopping at every floor on the way up, only to find out that there are no pet sign boards in the entire building. So your friend can buy any apartment from 1–100, but who cares, it took you almost two hours to find that out. ? ?.

最糟糕的情況是，您到達100樓，在上升的每一層都停下來，卻發現整個建筑物中沒有寵物告示板。因此，您的朋友可以購買1100至100歐元之間的任何公寓，但誰在乎，您花了將近兩個小時才找到答案。？？

Algorithmically, given an array of 100 boolean values, the index of the array represents building floors and a 0 represents a floor where no pets are allowed while a 1 represents a floor where pets would be allowed. From the rule of the building, the array would be of the form

從算法上講，給定一個包含100個布爾值的數組，該數組的索引表示建筑物樓層，0表示不允許寵物進入的樓層，而1表示允許寵物進入的樓層。根據建筑物的規則，數組的形式為

000... 1111...

that is, all 0s followed by all 1s, because only the top floors can be the ones where pets are allowed.

也就是說，全0后面全為1，因為只有頂層可以帶寵物。

Given this array, we need to find the first index where there is a 1 . A linear search algorithm for this problem would be as simple as iterating over the array and looking for a 1 and returning when we find one.

給定此數組，我們需要找到第一個索引為1 。解決這個問題的線性搜索算法就像在數組上迭代并尋找1并在找到一個返回時一樣簡單。

As expected, the complexity of this algorithm would be O(n) where n = 100 for our specific building example. You need to come up with something faster than this. Stopping at every floor is not feasible, as it would take you a lot of time to cover the entire building in the worst case.

不出所料，此算法的復雜度為O(n) ，其中對于我們的特定建筑示例，n = 100。您需要提出比這更快的方法。在每一層都停下來是不可行的，因為在最壞的情況下，要花費很多時間才能覆蓋整個建筑物。

二進制搜索法 (Binary Search Approach)

Let’s say you start from ground floor and got to the 50th floor with no stops. At the 50th floor, you stopped and got out of the lift and checked for the sign. The board sign said “No Pets”. This would mean that, until the 50th floor, there are definitely no pets.

假設您是從一樓開始，一直到50樓而已。在50樓，您停下來，走出電梯，檢查招牌。板子上寫著“No Pets” 。這意味著直到50樓，絕對沒有寵物。

So now knowing that you reduce your search space to the other half, which is floors 51–100. This means that with a single stop, you were able to cover half of the building knowing for sure that the first half doesn’t have any pets. That’s amazing!

因此，現在您知道將搜索空間減小到另一半，即第51-100層。這意味著您只需停下來，就可以覆蓋整個建筑物的一半，同時要確保前半部分沒有寵物。棒極了！

Moving on, you again divide your remaining set of floors into half and take the lift and go directly to the 75th floor. And you see a “Pets” sign board there. This means the floor where it started showing up must be between 50–75. You can keep following a similar approach of diving the remaining floors into half and checking until you find the first floor with the “Pets” sign board.

繼續前進，您再次將剩余的樓層分成兩半，乘坐電梯，直接進入第75層。然后您會看到一個“Pets”標志牌。這意味著開始顯示的地板必須在50-75之間。您可以繼續采用類似的方法，將剩余的樓層分成兩半，然后進行檢查，直到找到帶有“Pets”標志牌的第一層為止。

You see, every time you make a decision, you divide your search space into two halves and go ahead with one half of the search space. That’s how we narrow down our search. Since we always divide the search space in two and choose one over the other, that is why this type of search strategy is called a Binary search strategy.

您會看到，每次做出決定時，您都會將搜索空間分為兩半，然后繼續進行一半的搜索空間。這就是我們縮小搜索范圍的方式。由于我們總是將搜索空間一分為二，然后選擇一個，因此這就是為什么這種搜索策略稱為Binary搜索策略。

Isn’t that way faster?

這樣不是更快嗎？

Let’s look into the algorithm for this.

讓我們研究一下算法。

If you’ve been following along closely and have a grasp of the algorithm, you would have realized a hard and fast condition for the binary search algorithm to work. The condition is that the array needs to be sorted beforehand. In our example, the building floors were sorted from 1–100 and we could easily divide the search space in two.

如果您一直密切關注并掌握了該算法，那么您將認識到二進制搜索算法工作的艱巨條件。條件是該數組需要事先排序。在我們的示例中，建筑物樓層的分類范圍為1–100，我們可以輕松地將搜索空間分為兩部分。

Let’s look at an example array which is sorted and try and search for an element in it.

讓我們看一個示例數組，該數組已排序并嘗試在其中搜索元素。

In the above example, the element to be searched is 8. The given array is a sorted array in increasing order. Once we find the middle element (which is 5), we see that the element to be searched is greater than the current index element. Since the array is sorted in increasing order, 8 would lie on the right of the array and can never be on the left side.

在上面的示例中，要搜索的元素為8。給定數組是按升序排序的數組。找到中間元素(為5)后，我們看到要搜索的元素大于當前索引元素。由于數組是按升序排序的，因此8將位于數組的右側，并且永遠不能位于左側。

So we ignore the elements to the left of 5 and continue our search with the remaining elements, eventually finding out 8.

因此，我們忽略5左邊的元素，并繼續搜索其余元素，最終找到8。

On the other hand, what if the array is not sorted? Even though we know the current element is 5 and we know we need to search for 8, we are not sure which direction is the right way to go. If we end up thinking the array is sorted and apply binary search and go to the right part, we will never find 8.

另一方面，如果數組未排序怎么辦？即使我們知道當前元素是5并且知道需要搜索8，我們也不確定哪個方向是正確的方向。如果我們最終認為數組已排序并應用二進制搜索然后轉到正確的部分，我們將永遠找不到8。

So binary search essentially wants your array to be sorted.

因此，二進制搜索本質上是希望對數組進行排序。

That was the standard binary search algorithm that we just looked at. But, as the title of the article suggests, there is a twist in the tale!

那就是我們剛剛研究過的標準二進制搜索算法。但是，正如文章標題所暗示的，這個故事有一個轉折！

I am an avid competitive programmer, and there was an interesting variant of the binary search algorithm in the CodeChef May Long Challenge.

我是一個狂熱的競爭程序員，在CodeChef May Long Challenge中有一個有趣的二進制搜索算法變體。

Essentially, the Chef wrote the classic binary search, assuming the input array would be sorted. All the other children in the class copied the code from him, as Chef is the best programmer in the class. His assumption could’ve cost the entire class their assignment marks, as the input array was not sorted beforehand.

本質上，Chef編寫了經典的二進制搜索，并假設輸入數組將被排序。班上所有其他孩子都從他那里復制了代碼，因為Chef是班上最好的程序員。他的假設可能使整個類失去了分配標記，因為輸入數組未事先排序。

The only thing the Chef can do is to preprocess the array by swapping some pair of numbers here and there so that the binary search procedure still returns the right index.

Chef唯一能做的就是通過在這里和那里交換一些數字對預處理數組，以便二進制搜索過程仍然返回正確的索引。

Note: The preprocessor above should ideally return the modified array for the binary search to work correctly. However, as the problem statement asks, we are just trying to determine the number of swaps needed for binary search to work correctly on the unsorted array given an input. The algorithm would also return a -1 if such a modification is not possible for the given array and element.

注意：理想情況下，上述預處理器應返回修改后的數組，以使二進制搜索正常工作。但是，正如問題陳述所要求的，我們只是試圖確定在給定輸入的情況下，二進制搜索在未排序數組上正確運行所需的交換次數。如果對于給定的數組和元素不可能進行這種修改，則該算法還將返回-1。

The idea here is very simple.

這里的想法很簡單。

We need to understand two basic steps. I call them the TI-ME steps. Perhaps that’ll help you remember what we are doing here.

我們需要了解兩個基本步驟。我稱它們為TI-ME步驟。也許這可以幫助您記住我們在這里所做的事情。

a. Target Index: The index of the element to be searched for. We need to know this, since this index would help us drive the modifications. Because every time we modify any element, we need to sail towards this index and not away from it.

一個。牛逼 ARGET 我 ndex：要搜索的元素的索引。我們需要知道這一點，因為該索引將幫助我們推動修改。因為每次修改任何元素時，我們都需要朝著該索引航行而不是遠離它。

b. Middle Element: If you look clearly in a binary search, it’s the middle element of the current search space which drives the next move. If this middle element takes us in the wrong direction, we need to replace with the appropriate element.

b。中號 iddleè字元素：如果您在二進制搜索看清楚，它的驅動下移動當前的搜索空間的中間元素。如果這個中間元素將我們引向錯誤的方向，則需要替換為適當的元素。

So, the whole idea here is that we swap all the middle elements which are wrongly placed.

因此，這里的整體思想是我們交換所有錯誤放置的中間元素。

The binary search algorithm (the value of the middle element with respect to the element to be searched, that is, X) can either take us towards the left half of the array or the right half. So, there are two possibilities for a wrongly placed middle element:

二進制搜索算法(相對于要搜索的元素而言，中間元素的值，即X)可以將我們帶到數組的左半部分或右半部分。因此，錯誤放置中間元素有兩種可能性：

The element to be searched was on the right of the middle element, but since Element[Mid] > Element[Target Index] , the binary search would have had to ignore the right half and move towards the left half. OR
要搜索的元素在中間元素的右側，但是由于Element[Mid] > Element[Target Ind ex]，因此二進制搜索將不得不忽略右半部分而向左半部分移動。要么
The element to be searched was on the left of the middle element, but since Element[Mid] < Element[Target Index] , the binary search would have had to ignore the left half and move towards the right half.
要搜索的元素在中間元素的左側，但是由于Element[Mid] < Element[Target Ind ex]，因此二進制搜索將不得不忽略左半部分而向右半部分移動。

Therefore, if a middle element is wrongly placed such that a number X was needed in its place where X < Element[Target Index] , then we maintain a counter for that and call it count_low_needed .

因此，如果錯誤地放置了一個中間元素，使得在X < Element[Target Ind ex]的位置需要數字X ，則我們為此維護一個計數器并將it count_low_nee ded。

Similarly, if a middle element is wrongly placed such that a number X was needed in its place where X > Element[Target Index] , then we maintain a counter for that and call it count_high_needed .

同樣，如果錯誤地放置了一個中間元素，使得在X > Element[Target Ind ex]的位置需要數字X ，則我們為此維護一個計數器，并將it count_high_nee ded。

Also, if we simply run the binary search algorithm over the given array while searching for numbers, there would be some numbers that would be correctly placed. These would be the middle elements that drove the binary search in correct directions corresponding to the given element X (the element to be searched). These numbers cannot be a part of the swapping, because they are rightly positioned with respect to X .

另外，如果我們在搜索數字時僅在給定的數組上運行二進制搜索算法，那么將會正確放置一些數字。這些將是在與給定元素X (要搜索的元素)相對應的正確方向上推動二進制搜索的中間元素。這些數字不能作為交換的一部分，因為它們相對于X位置正確。

Let’s look at the pseudo code for this algorithm first and then go through an example.

讓我們先看一下該算法的偽代碼，然后再看一個例子。

function can_preprocess(arr, X){     low = 0     high= 0

while X is not found {          mid = (low + high) / 2          if arr[mid] == X {             break                     }

correctly_placed_low = 0          correctly_placed_high = 0          count_low_needed = 0          count_high_needed = 0

if `mid` suggests we should go right for X {               if X is actually on the right {                   correctly_placed_low ++               }               else {                   count_low_needed ++               }          } else {               if X is actually on the left {                  correctly_placed_high ++               }                else {                  count_high_needed ++               }          }

modify low and high according to           where `X` actually is with respect to `mid`

// Total smaller numbers available for swapping     TSM = sorted_index[X] - correctly_placed_low

// Total Larger numbers available for swapping     TLM = (N - sorted_index[X]) - correctly_placed_high

if count_low_needed > TSM or count_high_needed > TLM {          return -1     }

return max(count_low_needed, count_high_needed)

NOTE: The problem statement fixes the input array for us and repeatedly passes values to be searched in the input array. So, we can iterate once over the original array to know the actual location of the element to be searched (create a dictionary, essentially).

注意：問題聲明為我們修復了輸入數組，并反復傳遞要在輸入數組中搜索的值。因此，我們可以遍歷原始數組一次，以了解要搜索的元素的實際位置(本質上是創建字典)。

Also, we need sorted_index[X] to tell us how many values are lesser than or greater than the element X in our array. We can sort the array and create another dictionary storing location of each element in the sorted array.

另外，我們需要sorted_index[X]告訴我們有多少個值小于或大于數組中的元素X 我們可以對數組進行排序，并創建另一個字典來存儲排序后的數組中每個元素的位置。

Let’s go through the steps of the proposed algorithm while dry running an example.

讓我們在嘗試運行示例的同時瀏覽所提出算法的步驟。

Given an unsorted array, you need to search for X = 4 .
給定一個未排序的數組，您需要搜索X = 4 。

Hence our target index is 7.
因此，我們的目標指數是7。

2. Mid element index < Target Index, so we need to maneuver our search to the right half. But Element[Mid] > Element[Target Index], hence count_low_needed = 1

2.中元素索引<目標索引，因此我們需要將搜索調整到右半部分。但是ut Element[Mid] > Element[Target索引]， hence count_low_need ed = 1

3. Mid element index < Target Index, so we still need to maneuver our search to the right half. Once again, Element[Mid] > Element[Target Index], hence count_low_needed = 2

3.中元素索引<目標索引，因此我們仍然需要將搜索調整到右半部分。再次n, Element[Mid] > Element[Target索引]， hence count_low_need ed = 2

4. The total number of swaps needed for binary search to return the correct index here would be two swaps with elements lower than 4. We have smaller numbers 1, 3 or 2 for swapping available, so we can successfully do the swapping for this array so that binary search correctly finds out 4 .

4.返回此處正確索引的二進制搜索所需的交換總數將是元素小于4的兩次交換。我們可以使用較小的數字1, 3 or 2進行交換，因此我們可以成功地對該數組進行交換這樣二進制搜索就可以正確找出4 。

Below is the Python code for the given problem. Every step is explained in the comments.

以下是給定問題的Python代碼。注釋中說明了每個步驟。

The time complexity of this Twisted Binary Search algorithm is still O(nlogn) .

這種扭曲二進制搜索算法的時間復雜度仍為O(nlogn) 。

I hope you were able to grasp the inner workings of the binary search algorithm and had fun while going through this interesting problem as well. If you found this post useful, spread the love and share as much as possible. ?

我希望您能夠掌握二進制搜索算法的內部工作原理，并且在經歷這個有趣的問題時也很開心。如果您發現此帖子有用，請傳播愛心并盡可能多地分享。？