Monday, June 20, 2016

Variants of Binary Search

Introduction

Given a sorted array $A$ and a target value $k$ to search for, return a position $p$ in the array for which $A[p]=k$ or return $-1$ to signal that the element was not found. The folklore algorithm to solve this problem is binary search: you look at the middle element, compare it to $k$ and decide whether to terminate, search in the left half-array or search in the right half-array. This is pretty fast and it takes time $O(\lg N)$ where $N$ is the length of the array. In this post we will explore several ways to implement binary search.

Standard Binary Search

Let's start by giving the standard implementation. The recursive implementation is obvious so let's focus on iterative implementations instead.

Meta Binary Search

Meta binary search is a variant in which the position containing the target value is built one bit at a time, starting with the most significant bit.

To get a better idea of how this algorithm works, let's walk through an example. Imagine that the input array is $A=[1, 3, 5, 7, 11]$ and that we are looking for the value $k=7$. The last position of the array is $4$ and we need $2$ bits to store it. The first loop computes this value. Now we start at the most significant bit and we have two options: set it to $1$ or set it to $0$. We try both options and select the correct one. If setting the current bit to $1$ exceeds the length of the array then our only choice is to set it to $0$. In the other case we set the bit to $1$ and check if the value sitting in the position is strictly greater than $k$, in which case we set the bit to $0$ instead because our position never decreases in value (so by setting the current bit to $1$ we'll just be searching in parts of the array in which all values are strictly greater than the target value).

Like standard binary search, meta binary search takes $O(\lg N)$ time. The first loop computes the number of bits required to encode the largest position in the array and the second one iterates over these bits and builds up the answer.

This might be your first encounter with meta binary search but you might have used it in the past without paying attention. If you wrote code to answer lowest common ancestor (LCA) queries by using sparse tables then your LCA procedure is actually a form of meta binary search.

Binary Search over Real Numbers

How to write binary search when the search space is a monotonic real interval ? Many programmers use epsilons as a terminating condition. So you often see code that looks like:

This approach is error prone and might even time out. We'll describe a clean binary search over a real interval. Instead of storing the left and right end points of the interval as we did with the integers, we'll store the left end point and the size of the search space. We divide the size of the search space by 2 after each step and so the algorithm is guaranteed to terminate when the size reaches $0$.


Fractional Cascading

We will close this post with a useful technique that can be used to reduce the cost of searching for a target value $k$ in a list of $M \ge 1$ sorted arrays instead of just one. To simplify the discussion let's assume that all of the arrays have length $N$. An easy way to solve this problem is to binary search in each of the arrays; this takes time $O(M \lg N)$. Can we do better ? Yes !

We will preprocess the list of arrays so that search queries are fast. The idea is to cascade a fraction of the last list to the one right above it and add links from those cascaded elements to their locations in the list they came from. Now take this newly created list and cascade a fraction of its elements to the list on top of it, and so on. Now we perform one binary search in the first array and then follow the pointers to locate the values in the subsequent arrays. This technique is often used in geometric algorithms to reduce the cost of searching.