HashSet to the rescue

This week while working on NHibernate.Remote I came across an unusual performance problem.  When sending objects back and forth across the wire, I need to walk the object graph to ensure that things like lazy sessions are updated to correspond to the local ISession.  This is done through a combination of reflection and serialization (which is reflection anyways).  I should be doing this in a custom serializer but MS has made it so incredibly difficult to extend the XmlObjectSerializer that I’ve just decided to walk the graph after deserialization. While walking the object graph I need to store the references of objects I’ve already probed so that I don’t end up with an infine loop due to circular references.  Anyways due to a combination of poor code and some optimization oversight I ended up with just over 15000 objects in my reference list which was just using a standard generic List<object>.   A query for ~1000 records was taking over 4 seconds to complete which simply was not acceptable.  At first I thought that it was just an unavoidable consequence of using so much reflection.  However in a bid to speed things up a little I ran the Ants analyzer on it.  To my complete surprise it was the reference list that was slowing things down.  After looking at my code I identified the week points in my code and optimized the amount of references I was storing in the list eventually ending up with ~4000 entries for my 1000 record query.   The query was now completing in well under a second.

Even though the problem was solved it still bugged me.  I had just mitigated the problem by reducing the amount of entries rather than making the lookup faster.  So the same slowdown would be experienced as someone increased the amount of records returned by their queries.

Then today  I unexpectedly ran across the solution to my problems.  The HashSet<T>.  This is a collection that is new in the .Net 3.5 framework and is contained in the System.Core library.  It uses indexing to help speed up the Contains() method and boy is it fast. Because it is a set and not a list it cannot contain the same element twice, but that happens to be just perfect for my needs.  After implementing the same buggy code using the HashSet<object> instead of List<object>  my 1000 record query took ~0.5 of a second to complete.  That’s even faster than the optimized code was using List<object>.   Of course optimizing is still important, but I gained an important speed boost due to the use of this exciting new collection.  I know this will definitely find its way into some of my other projects too.

8 Responses to “HashSet to the rescue”

  1. Matt Says:

    I’m confused – aren’t hashes O(1) lookup? If so, than how is indexing a hash doing anything? How is HashSet different from Dictionary or SortedDictionary ?

  2. superjason Says:

    Good find! I’m assuming that you would still want to use a List most of the time if you don’t need to make “Contains” calls.

  3. Daniel Guenter Says:

    You are correct, the hashed set is not really any different from a dictionary except for the fact you don’t have to generate your own unique hashes. It’s also easier to work with if you aren’t wanting to work with the dictionary structure.

  4. configurator Says:

    Oddly enough, I’ve had the exact same class with the same name AND implementation in my personal Framework for a few years now, since .Net 2.0.

  5. Matt Says:

    Interesting – so you don’t have to know the key to the hash, it’s generated dynamically from the object? So your lookup is using the object, instead of a must-be-known key?

  6. Daniel Guenter Says:

    Yes, that is correct the key is generated automatically from the object. So you don’t ever have to worry about the key when you are doing lookups.

  7. Recent Links Tagged With "objectgraph" - JabberTags Says:

    […] links >> objectgraph Loading a complex object graph Saved by nickmosca123 on Thu 22-1-2009 HashedSet to the rescue Saved by shariq02ca on Thu 22-1-2009 Deep Send: How to wrangle with large object graphs and come […]

  8. AdamR Says:

    Contrary to what the author is implying, with a Dictionary you “don’t need to generate your own unique [sic] hashes” either — both the HashSet and Dictionary uses the same default IEqualityComparer.

    His original design could (should?) have used a Dictionary to get O(1) lookup speeds if it as pre 3.5.

Leave a Reply