Monday 10 August 2009

Caution : Static Events!

I have spent quite a lot of time recently tracking down memory leaks in our C++/CLI & C# application. This has meant using a wide variety of tools and generally thinking far too hard!

The tools we've been using include:

* SOS - Son of Strike debugger, used inside WinDBG
* SOSEX - SOS extensions
* IBM's Purify & Quantify
* GlowCode
* Excel
* VS Debugger....

The list goes on and on.

Special note of thanks to GlowCode who actually provide a tool that can track unmanaged memory leaks in a mixed mode application. In the good old days before .Net, I used Rational (as it was then) Purify to track leaked memory in my C++ applications. It was fairly straight forward to use and between that and its sister product Quantify, I have had many a productive session plugging memory leaks. Since .Net came along, Purify is now all but useless, only tracking managed memory and not showing me where I am leaking unmanaged memory in a mixed application. I've been looking for a tool to do such a job for years and recently came across a link on StackOverflow that pointed to this post which mentioned GlowCode. Having tried it, I love it. It tells me where I am leaking memory in both managed and native heaps in the same application!

One of the biggest problems we've had is static events on a static event broker object holding on to large object graphs causing our application to run out of memory. The Event Broker is meant to act as a central point where classes can subscribe to (attach delegates to) events that other parts of the system can raise in blissful ignorance of each other. Everyone knows about the EventBroker but not about the source or sinks of the events. This has worked well for us as is used extensively throughout our application.

What we hadn't realised (or at least, what we hadn't thought about too hard) is when an object subscribes to a static event, unless it explicitly unsubscribes from it, it will never be garbage collected. This might sound obvious, but most events that are subscribed to are not unsubscribed from explcitly, for example button handlers. When everything goes out of scope and is no longer 'rooted', the GC will collect the whole lot in one go. Unfortunately, this does not happen for static events because the static delegate is a root for the object that has subscribed to it. This means that when the GC trawls through all the heap objects in the thread during the collection, it will find a root for every subscribed object graph, so will never collect them.

The best tool for finding such leaks (OK, not technically leaks in the C/C++ sense, but pure badness all the same) was the SOS debugger and WinDBG. SOS has the extremely helpful command "!dumpheap -type fullyQualifiedTypeName" that will tell you how many objects of the specified type you currently have in memory, where they are (their address) and then using "!gcroot address", will tell you what is causing the GC root to them. SOSEX also has the !refs command which does something very similar but in a prettier fashion.

This lack of collection meant our application was eventually producing OutOfMemoryExceptions! We then discovered the MemoryFailPoint class in the BCL (another StackOverflow article) and this has allowed us to degrade gracefully, rather than simply corrupt out heap (as an OOM exception is likely to do) and force the application to close.

The idea behind the MemoryFailPoint is that before you allocate a chunk on memory, you ask whether there is a large enough piece of contiguous memory available on the managed heaps. If there is, then fine, your code carries on as normal, if there isn't then the MemoryFailPoint throws an InsufficientMemoryException exception. The key point is that an OOM will leave your application in an unstable state, whereas an InsufficientMemoryException won't. The trick is being able to estimate how much contiguous memory (in MB) you'll need. You want to do it at a fairly course level, as the performance of the code is decreased, but the up side is that the code protected by a MemoryFailPoint should not thrown an OOM exception.