Figment Engine: big-oh

Showing posts with label big-oh. Show all posts

Tuesday, January 6, 2009

Fibonacci and Big-Oh reduction

I was asked to write some C# to implement a Fibonacci sequence, which falls into two parts: 1) generate a single number in the sequence, 2) generating the whole sequence.

Note that most examples of this sequence generator have a nasty bug in them. Most examples use the int data type, whilst this code uses the long data type, otherwise this function overflows around n=47.

The naive way to implement this is recursively:

  /// <summary>
  /// slow version of fib calc for testing - warning
  /// runs in O(1.618..ⁿ) 
  /// </summary>
  internal long Fibonacci(int n)
  {
   if (n < 2)
    return n;
   else
    return Fibonacci(n - 1) + Fibonacci(n - 2);
  }

And a more optimal approach is:

  /// <summary>
  /// returns the Fibonacci number for N
  /// </summary>
  public long Fibonacci(int n)
  {
   // returns N where n < 2, for higher values of N
   // avoids recursive solution by keeping track
   // of previous results

   long fn = 0;

   if (n < 0) // offset must be positive
    throw new NotSupportedException("Fibonacci sequence offset must be positive");
   else if (n < 2) // returns N where N < 2
    fn = n;
   else // use caching technique for higher values of N
   {
    long fnMinusTwo = 0;
    long fnMinusOne = 1;

    for (long ni = 2; ni <= n; ni++)
    {
     // effectively: fn = Fibonacci(n - 1) + Fibonacci(n - 2)
     fn = fnMinusOne + fnMinusTwo;

     fnMinusTwo = fnMinusOne;
     fnMinusOne = fn;
    }
   }

   return fn;
  }

The latter approach is much more efficient O(n) vs O(1.618..ⁿ). However its not as simple as the former. Another approach is to take the former solution and use a local cache:

  /// <summary>
  /// caching version of fib calc
  /// </summary>
  Dictionary<int, long> fibCache = new Dictionary<int, long>();
  internal long Fibonacci(int n)
  {
   long fn = 0;

   // use value from cache if already calculated
   if (fibCache.ContainsKey(n))
    return fibCache[n];

   // not cached, need to work it out
   if (n < 2)
    fn = n;
   else
    fn = Fibonacci(n - 1) + Fibonacci(n - 2);

   // add to cache
   fibCache.Add(n, fn);

   return fn;
  }

This gives us an O(n) runtime with a reasonably simple implementation. This is of course the classic trade-off of time vs. space - we have increased space used (the dictionary) to decrease time.

Finally we want to generate the whole sequence, which we can do in a lazy evaluation way using the yield keyword:

  /// <summary>
  /// returns the first N items in the Fibonacci sequence
  /// </summary>
  public static IEnumerable<long> FibonacciSequence(int n)
  {
   for (int fn = 0; fn < n; fn++)
    yield return Fibonacci(fn);
  }

Wednesday, December 10, 2008

OpenStreetMap rendering in Silverlight part VIII

One of the issues that my current renderer has is that it's not as smooth as the raster versions used in the slippy map. One of the worst offenders was road junctions, where two roads meet. They looked something like this:

In the image you can see where the two roads meet there is a gap. This is due to my code drawing the two roads as separate polylines. I create a test renderer to see if this could be improved:

This version has joined the two roads together, drawing only one polyline for both of them. This results in a smooth join, and also a reduction in the number of objects. For example joining roads where they share a start and end point reduces the number of objects for south-west London from ~40k to ~30k. So not only does it create nicer looking maps, it makes them less memory intensive.

It occurs to me that this technique could be used for any Xaml consisting of multiple lines, as my findings suggest that Silverlight performs faster (one poly line with a 1,000 segments is faster than 1,000 single line segments)

My join algorithm is brute forced within a tile (creating an O(n²) complexity) which won't do for release code. So I will write a smarter version - hopefully of the order of O(3n).

I also spotted that a lot of ways within OSM are artificially split (the river Thames for example) I don't know if this is a feature of data collection or policy. My QuadStore design can cope with large entities like the river Thames quite easily - so applying this same technique at a higher level may generate smooth maps overall.

Thursday, November 27, 2008

OpenStreetMap rendering in Silverlight part II

So I've been progressing quite quickly with the OSM data, I'm currently using the UK data set to trial stuff for a number of reasons - the biggest one being that for some reason I can't successfully unzip the full data set *grr* (I suspect this is my PC rather than anything else)

I've written a data store for project cobalt that allows me to restructure the OSM data into a format more suitable for realtime map generation - I'm calling it a QuadStore as its a geographically clustered XML data model and GCXDM does not sounds as good.

Throwing all 1.2gb of the UK data generates a QuadStore of 365mb which is a compression I get for free due to losing duplicated data in the OSM model. I'm also ignoring OSM "relationship" data at the moment as a) I don't know what it means yet b) I'm having to much fun with just "ways".

I've then updated the silverlight proof of concept to use the QuadStore via a webservice - this seriously kicks ass in performance terms (and the proof of concept just did not scale for multi tile retrieval). Using the QuadStore means that tile filling is 1 (yes) one cost - O(1) rather than O(n) where n=number of points in tile.

So for fun I ask the proof of concept to pull every tile and draw them out as rectangles - which looks something like this (the screen shot is halfway through the draw so you can see the progression)

There's quite a bit of performance to be gained from the webservice, for example by compressing the stream (though compressed streams are not available in silverlight). And also from returning multiple tiles per call rather than one at a time.

So the next step is to change the silverlight UI to manage tiles correctly, once it does this I will release a working version. One of the nice features I should be able to deliver is being able to rotate the map...

Wednesday, November 19, 2008

Finding the Nth open door

As a follow up to my previous post about the 100 Door Puzzle, I realised there is a variation of the question where there is an even faster answer.

If the question is: which door is the n^th open door (for example tell me where the 5th open door is) then the answer is: n² or as per the example 5²=25 (the 25th door along is the 5th open door)


  private static int NextOpenDoorIsAt(int nthOpenDoor)
  {
   return (int)Math.Pow(nthOpenDoor, 2d);
  }

So in the case where you only need to find the n^th door that is open this is a faster method. This of course could be applied to my previous post to give an O(√N) solution for the whole series - the faster being down to the speed of summation vs. product.

Toggle door puzzle, fast solution

I was recently asked the following question:

You have 100 doors in a row that are all initially closed. you make 100 passes by the doors starting with the first door every time. the first time through you visit every door and toggle the door (if the door is closed, you open it, if its open, you close it). the second time you only visit every 2nd door (door #2, #4, #6). the third time, every 3rd door (door #3, #6, #9), etc, until you only visit the 100th door.

question: what state are the doors in after the last pass? which are open which are closed?

If the questions is expressed as is door N open, the answer is

bool doorOpen = (Math.Sqrt(doorNumber) % 1) == 0;

However, as part of going through the puzzle I noticed that there seemed to be a sequence in the offset of the open doors (1, 4, 9), which looked suspiciouly like odd number offsets: 1=1, 4=1+3, 9=4+5 i.e. the value of the offset of the next open door is the previous open door offset plus the next odd number.

Which when I applied it to larger doorsets worked as well. Looking at 50 doors, which doors are open (O) vs closed (c):
[OccOccccOccccccOccccccccOccccccccccOccccccccccccOc]

If you are given the problem of find all open doors in the set, then using the square root approach is O(n). So the square root approach would look like:

private static BitArray GenerateDoorTogleSeriesResult(int doors)
  {
   BitArray result = new BitArray(doors);

   for (int door = 1; door <= doors; door++)
    result[door - 1] = (Math.Sqrt(door) % 1) == 0;

   return result;
  }

whilst my approach looks like:

private static BitArray GenerateDoorTogleResult(int doors)
  {
   BitArray result = new BitArray(doors);

   int odd = 1;
   for (int door = odd; door <= doors; door += odd)
   {
     result[door - 1] = true;
     odd += 2; // next odd number
   }
   return result;
  }

The difference being that for 1,000 doors the square root apprach requires 1,000 square root calculation, whilst mine requires 31 summations. Or in big-oh O(√n).

The calling code looks like:

static void Main(string[] args)
  {
   const int DOORS = 10;
   BitArray suggestedAnswer = GenerateDoorTogleResult(DOORS);
   BitArray fastAnswer = GenerateDoorTogleSeriesResult(DOORS);

   Console.WriteLine("suggested:\n{0}", ShowDoors(suggestedAnswer));
   Console.WriteLine("fast\n{0}", ShowDoors(fastAnswer));
   Debug.Assert(ShowDoors(suggestedAnswer).CompareTo(ShowDoors(fastAnswer)) == 0);
  }

  private static string ShowDoors(BitArray doors)
  {
   StringBuilder sb = new StringBuilder(doors.Length+4);
   sb.Append("[");
   foreach (bool doorOpen in doors)
    sb.Append(doorOpen == true ? 'O' : 'c');
   sb.Append("]\n");

   return sb.ToString();
  }

I subsequently found the proof for this by Francesco Maurolico in 1575.

1575: "Arithmeticorum Libri Duo" by Francesco Maurolico is the first book
to prove a theorem by Mathematical Induction. The theorem was known 2000
years earlier by Pythagorus, but they proved it geometrically. The
theorem is that the sum of the first N odd numbers
1 + 3 + 5 + 7 + 9 + 11 + 13 +... + (2N-1) = N x N

Thursday, October 9, 2008

Permutations and Anagrams

I needed to generate some anagrams, which I know falls into the area Permutations (combinations being altogether the wrong thing for this). There is a nice code sample on Wikipedia that shows how to generate these for an unordered collection.

I implemented a generic class to carry out permutations for me, I changed it to be based on an iterator pattern as often you don't want to generate the whole set, mainly because of the big-oh O(n!), class is therefore:


class Permutation<T>
{
  IEnumerable<T> set;

  private Permutation()
  {
  }

  public Permutation(IEnumerable<T> set)
  {
    this.set = set;
  }

  public IEnumerable<IEnumerable<T>> Mutations
  {
    get
    {
      int length = set.Count();

      int factorialK = (int)Factorial(length);

      for (int k = 0; k < factorialK; k++)
        yield return GeneratePermutation(k, set, length);

    }
  }

  private static IEnumerable<T> GeneratePermutation(long k, IEnumerable<T> set, int length)
  {
    T[] result = set.ToArray();
    for (int j = 2; j <= length; j++)
    {
      k = k / (j - 1);
      int dest = (int)(k % j);
      int source = j - 1;
      if (source != dest)
        Swap(ref result[dest], ref result[source]);
    }

    return result;
  }

  private static void Swap(ref T a, ref T b)
  {
    T temp = a;
    a = b;
    b = temp;
  }

  private static long Factorial(int n)
  {
    long result = 1L;

    for (int integer = 2; integer <= n; integer++)
      result *= integer;

    return result;
  }
}

which allowed me to write the following code to generate ananagrams:


static void Main(string[] args)
{
  string word = "ape";

  List<string> anagrams = GenerateAnagrams(word);

  anagrams.Sort();

  Console.WriteLine("{0} anagrams", anagrams.Count);
  anagrams.ForEach(anagram => Console.WriteLine(anagram));
}

private static List<string> GenerateAnagrams(string word)
{
  List<string> sets = new List<string>();

  Permutation<char> perm = new Permutation<char>(word);
  foreach (char[] combo in perm.Mutations)
    sets.Add(combo.ToString());

  return sets;
}

note that we are taking advantage of string being a char[], for other types such as int would need to be passed as an array, list or similar IEnumerable<T>

Thursday, August 21, 2008

Big-oh no notation

Whilst doing some revision for an exam I had to refresh myself on Big-Oh notation which showed me how much is taken for granted by documentation writers.

I refer of course to MSDN, which there appears to be little reference to this notation - the notable exception being "An Extensive Examination of Data Structures". Why does this matter? well if you are trying to work out what the order of magnitude of your code is then you need to know what the underlying framework is doing.

IMHO the collections classes and other such constructs should have their Big-Oh documented against them in the class library.