Performance Zone is brought to you in partnership with:

Sasha Goldshtein is a Senior Consultant for Sela Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance. You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha. Sasha writes from Herzliya, Israel. Sasha is a DZone MVB and is not an employee of DZone and has posted 196 posts at DZone. You can read more from them at their website. View Full User Profile

Traversing the GC Heap with ClrMd

05.21.2013
| 1100 views |
  • submit to reddit

ClrMd is a newly released diagnostic library that wraps the CLR’s undocumented data access interfaces (a.k.a. “DAC”) in a friendly managed API. The underlying interfaces are what debugger extensions like SOS and SOSEX use to implement various diagnostic features, including enumerating the managed heap, detecting deadlocks, inspecting object contents, and dumping type/method information.

Given my personal and professional interest in debugging tools and techniques, ClrMd is an incredible tool – I can now implement my own diagnostic features without relying on undocumented interfaces or parsing text output from debugging extensions (which also requires going through a debugger in another process).

To pique your interest, I whipped together a quick sample illustrating how commands like !DumpHeap, !DumpObj, and !GCRoot can be implemented using ClrMd. This is a sample, so there is clearly room for optimization and the code could be cleaner, but the ability to cram so much functionality in 200 lines of C# code is nothing short of overwhelming.

Without further ado, here’s some output (slightly formatted for clarity):

$ GcRoot.exe d:\temp\leak.dmp
> dumpobjects Schedule
31cae98 MemoryLeak.Schedule
31cd5f0 MemoryLeak.Schedule
31cfd48 MemoryLeak.Schedule
31d24a0 MemoryLeak.Schedule
31d4bf8 MemoryLeak.Schedule
31d7350 MemoryLeak.Schedule
31d9aa8 MemoryLeak.Schedule
31dc200 MemoryLeak.Schedule
31de958 MemoryLeak.Schedule
31e10b0 MemoryLeak.Schedule
31e3808 MemoryLeak.Schedule
31e5f60 MemoryLeak.Schedule
31e86b8 MemoryLeak.Schedule
> dumpobject 31d9aa8
System.Byte[] _data = 31d9ac0
> gcroot 31d9aa8
READY FOR FINALIZATION finalization handle(0)
  --> MemoryLeak.Employee(31d9a90)
  --> MemoryLeak.Schedule(31d9aa8)
> q
$

And now for some code snippets. First, we must initialize the main ClrMd objects by loading the dump (or attaching to a live process), enumerating the CLR versions in that dump (or process), and making sure the DAC DLL is accessible:

DataTarget target = DataTarget.LoadCrashDump(args[0]);
string dacLocation = target.ClrVersions[0].TryGetDacLocation();
if (string.IsNullOrEmpty(dacLocation))
{
    Console.WriteLine("*** Cannot find DAC location");
    return;
}
ClrRuntime runtime = target.CreateRuntime(dacLocation);
ClrHeap heap = runtime.GetHeap();
if (!heap.CanWalkHeap)
{
    Console.WriteLine("*** Cannot walk the heap");
}

Next, enumerating all objects in the heap that have a certain type:

foreach (ulong objPtr in heap.EnumerateObjects())
{
    ClrType type = heap.GetObjectType(objPtr);
    if (type.Name.Contains(typeName))
    {
        Console.WriteLine("{0:x}\t{1}", objPtr, type.Name);
    }
}

Displaying object fields (only instance fields, not statics):

ClrType type = heap.GetObjectType(objPtr);
foreach (ClrInstanceField field in type.Fields)
{
    string fieldType = field.Type == null ? "<TYPE>" : field.Type.Name;
    if (field.IsPrimitive() && field.HasSimpleValue)
    {
        Console.WriteLine("{2} {0} = {1}", field.Name, field.GetFieldValue(objPtr), fieldType);
    }
    else if (field.IsObjectReference() && field.HasSimpleValue)
    {
        Console.WriteLine("{2} {0} = {1:x}", field.Name, field.GetFieldValue(objPtr), fieldType);
    }
}

Traversing roots is a little bit more complicated. It all starts with ClrHeap.EnumerateRoots, but the ClrType.EnumerateReferencesOfObject method is key. It allows you to recursively traverse the heap until you find the object in question. (Along the way, you must keep track of objects that have already been visited so you don’t get yourself in an infinite loop.)

The gist is the following recursion, with some of the code removed for clarity:

private static void DisplayRefChainIfReachedObject(ulong objPtr, ClrRoot root, Stack<ulong> refChain, HashSet<ulong> visited)
{
    ulong currentObj = refChain.Peek();
    if (visited.Contains(currentObj)) return;
    visited.Add(currentObj);
    if (currentObj == objPtr)
    {
        //Display the root chain – omitted for clarity
    }
    ClrType type = heap.GetObjectType(currentObj);
    type.EnumerateRefsOfObject(currentObj, (innerObj, fieldOffset) =>
    {
        refChain.Push(innerObj);
        DisplayRefChainIfReachedObject(objPtr, root, refChain, visited);
        refChain.Pop();
    });
}

To summarize: ClrMd is a new managed diagnostic library that wraps the functionality previously exposed only from debugging extensions. It opens a wide range of possibilities for automatic diagnostics of managed processes and dump files.



Published at DZone with permission of Sasha Goldshtein, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)