Implementing Binary Search

This afternoon, I was reading a chapter in "Beautiful Code" (check it out at Amazon). There was a reference to the binary search algorithm presented in Jon Bentley's book "Programming Pearls" (see Amazon), accompanied by the challenge to write a correct implementation of the algorithm. To quote from Bentley:

Most programmers think that with the above description [of the algorithm] in hand, writing the code is easy. They are wrong. The only way to believe this is by putting down this column right now and writing the code yourself. Try it.

I tried it.

Later...

The experiment was enlightening: I must admit that I spent a lot more time than I thought I would when I set out to accept the challenge. I guess I am not a super-human programmer after all :-)

I deliberately did not code in an IDE because I wanted to avoid the temptation of using the debugger to check whether my implementation works or not. For this reason, the code you see below is "mind-tested" only and might have any number of bugs, especially when it comes to syntax (it's supposed to be C/C++). I hope that it is correct, though.

Integer overflow

The text in "Beautiful Code" demonstrates an integer overflow bug present in Jon Bentley's algorithm implementation. This is the code line:

int mid = (low + high) / 2;

The problem is that low and high are both integer variables whose values can become very high. If the sum of low and high overflows the maximum value that integers can have (e.g. 2^31-1), mid becomes negative.

I am moderately pleased to say that (I think) I have avoided this bug, although I must admit that I did so while looking for danger in another corner. This is what happened (to understand the following passage you must understand my implementation below):

  • because I articulated the invariant about POSITION_TYPE, and was not happy about it, I started to think about what would happen if POSITION_TYPE were an unsigned type
  • I recognized the danger of subtracting 1 from peekIndex when peekIndex is 0 and an unsigned type
  • I saw that the while-loop condition would break down in such a case, which made me add the extra check (1 == rangeSize)
  • once I had the rangeSize calculation in place, I saw that I could use rangeSize to calculate peekIndex
  • so, by using rangeSize I happened to avoid the bug, even though my mind actually was occupied with the correct handling of the while-loop condition

Conclusion

Well... I don't have one :-) I am posting this simply because it occurred to me that it has been several months since I last wrote something on this website :-)

The implementation

Input

  • Pre-sorted array of numbers
  • Number of elements in the array
  • Target element

Output

  • Position of target element in the array
  • -1 if array does not contain the target element

Invariant

  • POSITION_TYPE must be a signed type so that the algorithm can return -1 to indicate failure to find the target element

The code

typedef int ELEMENT_TYPE
typedef int POSITION_TYPE
POSITION_TYPE bsearch(ELEMENT_TYPE[] array,
                      POSITION_TYPE arraySize,
                      ELEMENT_TYPE targetElement)
{
  // Must check <0 in addition to == 0 because POSITION_TYPE is signed
  if (arraySize <= 0)   
    return -1;
  POSITION_TYPE rangeStartIndex = 0;
  POSITION_TYPE rangeEndIndex = arraySize - 1;
  while (rangeStartIndex <= rangeEndIndex)
  {
    POSITION_TYPE rangeSize = rangeEndIndex - rangeStartIndex + 1;
    // Fraction caused by odd-numbered range size is truncated
    POSITION_TYPE peekIndex = rangeStartIndex + (rangeSize / 2);
    ELEMENT_TYPE peekElement = array[peekIndex];
    if (targetElement == peekElement)
      return peekIndex;

    // Range cannot be reduced anymore, therefore the array does not contain
    // the target element
    // Note: This check is important because it catches the following
    // border cases:
    // - POSITION_TYPE is unsigned and peekIndex is 0: if below we subtract 1
    //   from peekIndex, rangeEndIndex becomes not -1, but a huge number,
    //   which causes the loop condition's abort criteria to fail. This case
    //   appears when targetElement is smaller than the element at array
    //   position 0.
    // - peekIndex is the highest possible number that can be expressed by the
    //   type POSITION_TYPE: if below we add 1 to peekIndex, rangeStartIndex
    //   overflows and becomes either 0 (if POSITION_TYPE is unsigned) or a
    //   very small negative number (if POSITION_TYPE is signed). Again, this
    //   causes the loop condition's abort criteria to fail. This case appears
    //   when targetElement is larger than the element at the highest array
    //   position.
    if (1 == rangeSize)
      break;

    if (targetElement < peekElement)
      rangeEndIndex = peekIndex - 1;
    else
      rangeStartIndex = peekIndex + 1;
  }
  return -1;
}

Additional thoughts

Some thoughts I wrote down during implementation:

  • odd/even array size
    • when I started coding I wondered whether my implementation would be affected by the array size being an odd or even number
    • I decided that I would write a naive implementation first, which I would then refine in a second iteration to cover special cases like these
    • as it turned out, I did not have to handle odd-ness or even-ness of the array size in any special way
  • invariant removal
    • the invariant that POSITION_TYPE must be a signed type can be removed if the function signature is slightly changed so that failure to find the target element can be indicated differently:
      bool bsearch(ELEMENT_TYPE[] array,
        POSITION_TYPE arraySize,
        ELEMENT_TYPE targetElement,
        POSITION_TYPE& result)
      
    • if bsearch() returns false, the target element could not be found (and the content of result is undefined)
    • if bsearch() returns true, the target element was found and its position in the array is filled into result
  • if the array has elements with duplicate values, there is no formal definition which one of the duplicates the algorithm should find