Thursday, September 18, 2008

Finding duplicates in an array

One of the problems I stumbled across recently was writing an efficient way to detect if a non-sorted integer array contains duplicates. Or more formally: an unsorted array of size N that contains values between 0 and N-1, find if it contains duplicated elements.

The trick being to avoid having to sort the array, or O(N2) loop within loops.

My solution was to use a bit array the size of N, which would flag if we had seen a value or not - we run through all the values in the array and check its position in the bit array, if it contains a 0 we have not seen it before, or a 1 if we have (duplicate). If we have not seen it before we set its position to 1 and continue.

In C# the is a great class that makes bit manipulation even easier "BitArray", it provides an efficient array class for handling bit operation.

The code:

static void Main(string[] args)
{
int[] numbers = { 1, 2, 3, 1, 5 };

bool duplicates = IsArrayDuplicated(numbers);
}

private static bool IsArrayDuplicated(int[] numbers)
{
bool result = false;

BitArray bits = new BitArray(numbers.Length+1);

foreach (int n in numbers)
{
result = !bits[n];

if (!result)
break;

bits[n] = true;
}

return result;
}

No comments: