Thursday, June 25, 2009

How the silverlight tag cloud works, including source code

The TagCloud Silverlight application allows a user to see a visual representation of an RSS feed. It shows a visualization of the labels used within a feed, using size to indicate frequency and closeness to other tags showing frequency of co-occurrence.
This means that tags that occur often together (for example “Silverlight” and “Xaml”) will tend to appear next to each other, whilst tags that don’t occur together in the feed (for example “Silverlight” and “Credit Crunch”) will tend to be far apart.
Colour is also used to group the tag cloud, with the most frequently occurring tags taking a colour, and co-occurring tags inheriting that colour. This visually groups the cloud by colour as well.
Mouse-Over a tag shows lines to all other tags that it has occurred with, and clicking presents a popup menu of posts that mention that tag. Clicking on an item in the menu navigates to that post.
You can also click and drag a tag for fun, and watch all the other tags chase it..
Model
  • Cloud: the model of a RSS feed, separating the model from the UI. Processes an RSS feed and create subordinate Post and Tag with relationships. The cloud class is more of a management class that delegates most of the work to the Tag class.
  • Post: very simple class that represents a post/entry in an RSS feed. Used to allow navigation back to a particular post.
  • Tag: represents a tag (label/syndicationItem) in an RSS feed. Holds frequency count of how many times it has co-occurred with other tags. Has a SeekHarmony function that does the majority of the application “ah!” factor.
Approach
The application builds an internal model of the RSS feeds, critically in terms of tags and their co-occurrences. It then follows a simulation approach, where it tries to position each tag such that it is near tags it has co-occurred with, and far from tags it has not.
The algorithm for doing this does not pay attention to the overall cloud shape, but acts as an emergent system – the application of the simple harmony seeking behaviour between tags creating a global balance (generally a circle)
Code structure
Four parts
  • JavaScript library to make embedding the application less error prone
  • A server based proxy for getting RSS feeds to avoid cross domain issues
  • A silverlight library for math functions (mostly polar calculations and some extension methiods)
  • TagCloud silverlight application, with App and Page classes and three supporting classes (Cloud, Post and Tag)
Lifecycle
The silverlight application is embedded in a HTML page which passes a number of parameters via JavaScript to the Silverlight object creation code. The Silverlight application reads these settings and downloads the RSS feed via a server side proxy (to avoid cross-domain issues). The code then build an internal model of the RSS feed, with co-occurrence information about tags. On a timer basis the code then moves each tag such that it is nearer tags it co-occurred with, and further from ones it has not occurred with. Over time an order emerges which shows how the tags relate – tags that are nearer co-occur more.
Embedding the control in the page
For my blog I use JavaScript to embed the silverlight control
<div id='feCloud' style='text-align: center;'/>  
 <script src='http://www.figmentengine.com/tagCloud/feCloudv1.2.js' type='text/javascript'/>  
 <script type='text/javascript'>  
  var feCloudElementId = 'feCloud';  
  var feCloudFeedAddress = 'http://feedproxy.google.com/FigmentEngine';  
  var feCloudNavigateFormat = 'http://blog.figmentengine.com/search/label/{0}';  
  var feCloudSize = 400;  
  feTagCloudLoad(feCloudElementId, feCloudFeedAddress, feCloudNavigateFormat, feCloudSize, feCloudSize);  
 </script> 
 
The function in the feCloudv1.2.js is a wrapper for the Silverlight.js createObject call:
function feTagCloudLoad(elementId, feed, navigateFormat, width, height)
{
  var params = "";

  if (feed != null)
    params += "feedAddress=" + feed + ", ";
  if (navigateFormat != null)
    params += "navigateFormat=" + navigateFormat + ", ";

  var slWidth = 350;
  if (width)
   slWidth = width;
  params += "width=" + slWidth + ", "; 

  var slHeight = 350;
  if (height)
   slHeight = height;

  var container = document.getElementById(elementId);
  var cloudControl = document.createElement('object');
  
  cloudControl.setAttribute('data', 'data:application/x-silverlight-2,');
  cloudControl.setAttribute('type', 'application/x-silverlight-2');
  cloudControl.setAttribute('id', 'feCloudControl');
  container.appendChild(cloudControl);

  var host = "http://www.figmentengine.com/";
  var source = host + "tagCloud/TagCloudV1.2.xap";
  var parentElement = container;
  var callbackId = "feCloud";
  var properties = { width: slWidth, height: slHeight, version: "2.0.31005.0", enableHtmlAccess: "true" };
  var events = { };
  var initParams = params;
  Silverlight.createObject(source, parentElement,
   callbackId, properties, events, initParams);
}
I’ve put in bold the section that deals with passing the feed information via initParams. I also set enabledHtmlAccess to allow the Silverlight code to navigate the browser using HtmlPage.Window.Navigate.
Application startup
Reads the settings specified in the embedding JavaScript, most importantly the address of the RSS feed and how to navigate to page. It creates an instance of the Page and asks it to populate the TagCloud based on the feed address information.
Populating the cloud
Obtain the RSS feed: Due to issues in making requests to websites that the silverlight does not originate from I use the technique outlined at Franksworld (when your silverlight app needs to get data from another server that does not contain a crossdomain.xml then it proxies the call via the server)
protected void Page_Load(object sender, EventArgs e)
    {
        // Load the URI from the Query String
        string sourceUriString = Request.QueryString["Uri"];

        try
        {
            // Clear the output buffer
            Response.Clear();

            // Make new WebClient 
            WebClient webClientRequest = new WebClient();

            // Download data from URI
            byte[] requestByteArray = webClientRequest.DownloadData(sourceUriString);

            // Match the Mime Types
            string contentType = webClientRequest.ResponseHeaders["Content-type"].ToString();
            Response.ContentType = contentType;

            // Copy the Streams
            int requestByteArrayLength = requestByteArray.GetLength(0);
            Response.OutputStream.Write(requestByteArray, 0, requestByteArrayLength);
            Response.OutputStream.Close();

            // Exit the Page
            // see http://support.microsoft.com/kb/312629
            //Response.End();
            HttpContext.Current.ApplicationInstance.CompleteRequest();

        }
        catch(Exception ex)
        {
            // 5xx errors mean server error
            Response.StatusCode = 501;
            Response.StatusDescription = "Error encountered. Details: " + ex.Message;
        }
    }
Conversion of the RSS into the internal object model ignores most of the information in the feed, concentrating on posts, tags and co-occurrence.
Initializing the UI
At start-up the code creates a pool of connecter lines (to reduce the need to create them dynamically). It also create TextBlocks to represent each tag. It then start the timer to repeatedly move the tags around (seeking harmony)
Seeking Harmony
The code only works at a tag level, relying on emergent behaviour to get the UI effect. For each tag we do the following:
  • Each tag calculates how near it should be to all the other tags based on frequency of co-occurrence.
  • It then uses a polar conversion to work out the vector it would need to move in to get to this position.
  • We add up all these vectors for the tag, which gives us a vector that if it applied would be the ideal location for it.
  • We then down-scale the vector:
  1. So tags don’t jump massive distances
  2. Tags can react to where all the other tags have moved to on the next cycle 
Source code
Source code available as a zip

No comments: