This week while working on NHibernate.Remote I came across an unusual performance problem. When sending objects back and forth across the wire, I need to walk the object graph to ensure that things like lazy sessions are updated to correspond to the local ISession. This is done through a combination of reflection and serialization (which is reflection anyways). I should be doing this in a custom serializer but MS has made it so incredibly difficult to extend the XmlObjectSerializer that I’ve just decided to walk the graph after deserialization. While walking the object graph I need to store the references of objects I’ve already probed so that I don’t end up with an infine loop due to circular references. Anyways due to a combination of poor code and some optimization oversight I ended up with just over 15000 objects in my reference list which was just using a standard generic List<object>. A query for ~1000 records was taking over 4 seconds to complete which simply was not acceptable. At first I thought that it was just an unavoidable consequence of using so much reflection. However in a bid to speed things up a little I ran the Ants analyzer on it. To my complete surprise it was the reference list that was slowing things down. After looking at my code I identified the week points in my code and optimized the amount of references I was storing in the list eventually ending up with ~4000 entries for my 1000 record query. The query was now completing in well under a second.
Even though the problem was solved it still bugged me. I had just mitigated the problem by reducing the amount of entries rather than making the lookup faster. So the same slowdown would be experienced as someone increased the amount of records returned by their queries.
Then today I unexpectedly ran across the solution to my problems. The HashSet<T>. This is a collection that is new in the .Net 3.5 framework and is contained in the System.Core library. It uses indexing to help speed up the Contains() method and boy is it fast. Because it is a set and not a list it cannot contain the same element twice, but that happens to be just perfect for my needs. After implementing the same buggy code using the HashSet<object> instead of List<object> my 1000 record query took ~0.5 of a second to complete. That’s even faster than the optimized code was using List<object>. Of course optimizing is still important, but I gained an important speed boost due to the use of this exciting new collection. I know this will definitely find its way into some of my other projects too.