搜索算法

二分搜索
类别	搜索算法
数据结构	数组
最坏情况性能	O(log n)
最佳情况性能	O(1)
平均情况性能	O(log n)
最坏情况空间复杂度	O(1)

试卷 1 - ⇑ 算法基础 ⇑
← 逆波兰	搜索算法	排序算法 →

搜索算法在给定数据结构中查找给定项。使用的算法取决于数据的结构方式。

线性搜索

如果你有一个未排序的列表（或数组），那么最简单的搜索算法是线性搜索：逐项遍历列表，并与搜索项进行比较。如果比较成功，则算法已找到该项。如果所有比较都失败，则该项不存在于数组或列表中。

在最简单的变体中，算法返回一个布尔值以指示成功或失败。以下是伪代码

for each item in the list:
    if that item has the desired value then
        stop the search and return true
return false

可以直接翻译成 Python

def exists(soughtItem, aList):
  """Return True if and only if soughtItem occurs in aList."""
  for item in aList:
    if item == soughtItem:
      return True
  return False

# automatic tests
assert exists(2, [2, 1, 3]) # sought item in first position
assert exists(3, [2, 1, 3]) # sought item in last position
assert not exists(3, []) # list is empty
assert not exists(0, [2, 1, 3]) # sought item doesn't exist

第二个变体返回列表中该项的位置（如果存在）。如果不存在，则算法返回一个不可能的位置，例如 -1。以下是伪代码

For each position in the list:
    If the item at that position has the desired value then
        stop the search and return the position
Return -1

以下是 Python 代码

def index(soughtItem, aList):
  """Return the position of soughtItem in aList if it exists, otherwise return -1."""
  for position in range(len(aList)):
    if aList[position] == soughtItem:
      return position
  return -1

# automatic tests
assert position(2, [2, 1, 3]) == 0 # sought item in first position
assert position(3, [2, 1, 3]) == 2 # sought item in last position
assert position(3, []) == -1 # list is empty
assert position(0, [2, 1, 3]) == -1 # sought item doesn't exist

以下完整的 VB 程序会提示用户输入一个字母，并在数组中搜索它。

dim items() = {"h","g","a","d","w","n","o","q","l","b","c"}
dim searchItem as string

console.write("What are you searching for: ")
searchItem = console.readline()

For x = 0 to 10
  If items(x) = searchItem Then
    console.writeline("Found item " & searchItem & " at position " & x)
    Exit For
  End If
Next
console.writeline(-1)

尝试上述代码，搜索字母“w”，然后搜索字母“z”。

代码输出

你要搜索什么：w
在位置 4 找到项 w

代码输出

你要搜索什么：z
-1

练习：线性搜索

考虑字符串列表“Cat”、“Mouse”、“Frog”、“Lion”、“Panda”、“Llama”、“Bee”，按此顺序排列。找到“Panda”需要多少次比较？

答案

5

而搜索“Camel”需要多少次比较？

答案

需要 7 次比较才能得出该字符串不在列表中的结论。

为上述代码制作一个跟踪表，其中 searchItem = "d"。

答案

searchItem	x	输出	item
searchItem	x	输出	0	1	2	3	4	5	6	7	8	9	10
d	0		h	g	a	d	w	n	o	q	l	b	c
	1
	2
	3	在位置 3 找到项 d

对于一个包含 $n$ 个项目的列表，要查看项目是否存在，需要进行的最大比较次数是多少？

答案

如果项目位于最后一个位置或根本不在列表中，则需要进行 $n$ 次比较。

二分搜索

正如最后一个问题所指出的，线性搜索可能需要与列表中项目数量一样多的比较次数，因此在一个包含数百万个名称的列表（例如一个国家的选民登记册）中搜索一个名字可能会花费很长时间。

如果你的列表按升序排列，可以使用排序算法，那么你可以执行二分搜索。这涉及在每次比较时将数据分成两半，从而更快地“缩小”到项目必须在列表中的部分（如果存在）。这会导致性能显著提升，如侧边栏所示（大 O 表示法在单元 4 中解释）。

让我们在以下排序列表中搜索名称 Miles

Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne

我们将 Miles 与中间名称 Mohammed 进行比较

Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne

Miles 在字母顺序上位于 Mohammed 之前，因此我们知道 Miles 不会位于 Mohammed 的右侧。因此，我们可以“丢弃”列表的右半部分

Ali, Bernie, Claire, ~~Mohammed, Peter, Simon, Yvonne~~

我们现在将 Miles 与剩余列表中的中间名称 Bernie 进行比较

Ali, Bernie, Claire, ~~Mohammed, Peter, Simon, Yvonne~~

Miles 在字母顺序上位于 Bernie 之后，因此我们可以丢弃左侧

~~Ali, Bernie~~, Claire, ~~Mohammed, Peter, Simon, Yvonne~~

最后，我们将 Miles 与此单项列表的中间名称 Claire 进行比较

~~Ali, Bernie~~, Claire, ~~Mohammed, Peter, Simon, Yvonne~~

Miles 与 Claire 不相同，没有更多项可比较，因此我们知道 Miles 不在列表中。

这仅使用二分搜索进行了 3 次比较，而使用线性搜索则需要 7 次比较。当列表很大时，效果会更好。例如，一个包含 1,000,000,000 个项目的列表仅使用二分搜索最多只需要进行 30 次比较！拥有排序数据非常有用。

练习：线性搜索与二分搜索

排序数据对于线性搜索也很有用。修改后的线性搜索算法在搜索 Miles 时如何能够进行少于 7 次比较？

答案

修改后的线性搜索会在 4 次比较（与 Ali、Bernie、Claire 和 Mohammed 比较）后知道 Miles 不在排序列表中，因为 Miles 应该在 Mohammed 之前出现。

对于上面显示的包含 7 个名称的列表，你能想到线性搜索比二分搜索更快的案例吗？

答案

如果我们搜索列表中的第一个项目 Ali，二分搜索仍然需要 3 次比较（与 Mohammed、Bernie 和 Ali 比较），但线性搜索只需要 1 次比较。

对于大型列表的线性搜索，最佳情况是搜索的项目位于第一个位置。对于大型列表的二分搜索，最佳情况是什么？

答案

如果搜索的项目位于列表的中间，则二分搜索只需要进行 1 次比较。

每次不成功的比较后，二分搜索都会将搜索空间减半。正在搜索的子列表可以用两个整数表示，它们表示子列表的起始和结束位置。Python 代码如下

def index(soughtItem, sortedList):
  """Return the position of soughtItem in sortedList if it exists, otherwise return -1.
  sortedList must be in ascending order."""
  # Initially, the sublist is the whole list of N items, from positions 0 to N-1
  start = 0
  end = len(sortedList) - 1
  while start <= end:                    # while the sublist is not empty
    middle = (start + end) // 2
    if soughtItem == sortedList[middle]: # the item is in the middle of the sublist
        return middle
    if soughtItem >  sortedList[middle]: # the item is in the right half
        start = middle + 1  
    if soughtItem <  sortedList[middle]: # the item is in the left half
        end = middle - 1   
  return -1                              # empty sublist, the item doesn't exist
  
# tests
assert index(3, [1, 2, 3]) == 2
assert index(1, [1, 2, 3]) == 0
assert index(1, []) == -1
assert index(0, [1, 2, 3]) == -1

单元 3 - ⇑ 算法基础 ⇑
← 逆波兰	搜索算法	排序算法 →