ClrMD: Recreating .NET objects from an Azure App Service memory dump

09 Aug 2017

Part 1: WinDbg: Recreating .NET objects from an Azure App Service memory dump
Part 2: ClrMD: Recreating .NET objects from an Azure App Service memory dump

In the previous post, we concluded that while WinDbg is well-suited for interactive exploration of an object graph, scripting would involve parsing text output. As a way around text parsing, in this post we explore the Microsoft Diagnostics Runtime, or ClrMD, a framework for inspecting .NET memory dumps from a .NET application. With ClrMD our scripting language is effectively C#.

WinDbg vs ClrMD

ClrMD supports the loading and querying of an opaque dump file in much the same way that ADO.NET supports loading and querying an opaque database file. Alternatively, think of ClrMD as the difference between cmd and PowerShell. PowerShell returns objects over strings which requires no parsing and is easy to continue processing. Instead of text, ClrMD returns .NET meta-objects, representing a .NET application frozen in time, and subject to querying using C#.

With ClrMD, locating instances of the Bugfree.Spo.Analytics.Cli.Domain+Visit type on the heap is simple enough. But extracting field values isn't. For starters, querying requires knowledge on whether a field holds a value or reference type and whether it's a primitive or complex type. Like with WinDbg, ClrMD understands primitive types only. For Guid, DateTime, IPAddress, and FSharpOption, we must query their internal field values and pass those along to the appropriate constructor. With Guid, for instance, this amounts to querying values of internal fields _a through _k and passing those to the Guid constructor.

Another difficulty with ClrMD's API is that it often returns a null value with little guidance as to why. And when it returns non-null, it's all too easy to read the wrong memory location or misinterpret the result, e.g., interpret random memory content as an object of some type, resulting in an object initialized from random memory content.

ClrMD.Extensions

ClrMD.Extensions is a library build on top of ClrMD to make querying more intuitive. The extensions can even construct Guid, IPAddress, and DateTime objects directly from heap object, and in few lines of code.

The only immediate downside to ClrMD.Extensions is that it doesn't come as a NuGet package. One must clone its GitHub repository, built the code, and reference the ClrMD.Extensions and Microsoft.Diagnostics.Runtime DLLs from its bin/debug folder.

Extracting and replaying visits

The following code makes up a console application and is all it takes (code available on Github) to query the dump, extract property values of Visit object, create Visit objects from those values, and place those on the message queue for replaying:

using System;
using System.Net;
using System.Linq;
using System.Threading;
using System.Collections.Generic;
using Microsoft.FSharp.Core;
using ClrMD.Extensions;
using static Bugfree.Spo.Analytics.Cli.Domain;
using static Bugfree.Spo.Analytics.Cli.Agents;

namespace Bugfree.Spo.Analytics.MemoryDumpProcessor {
    class Program {
        static void Main() {
            var visits = new List<Visit>();
            using (ClrMDSession session = ClrMDSession.LoadCrashDump(@"C:\AzureDump\Bugfree.Spo.Analytics.Cli-d3c510-07-25-13-08-00.dmp")) {
                foreach (ClrObject o in session.EnumerateClrObjects("Bugfree.Spo.Analytics.Cli.Domain+Visit")) {
                    var pageLoadTime = (int?)o["PageLoadTime@"]["value"].SimpleValue ?? null;
                    var userAgent = (string)o["UserAgent@"]["value"].SimpleValue ?? null;
                    var v = new Visit(
                        (Guid)o["CorrelationId@"],
                        (DateTime)o["Timestamp@"],
                        (string)o["LoginName@"],
                        (string)o["SiteCollectionUrl@"],
                        (string)o["VisitUrl@"], 
                        pageLoadTime == null ? FSharpOption<int>.None : new FSharpOption<int>(pageLoadTime.Value),
                        (IPAddress)o["IP@"],
                        userAgent == null ? FSharpOption<string>.None : new FSharpOption<string>(userAgent));
                    visits.Add(v);
                }

                // Enumerating the heap doesn't preserve allocation order. Hence we impose an
                // order using the visit's timestamp for easier inspection.
                foreach (var v in visits.OrderBy(v => v.Timestamp)) {
                    visitor.Post(VisitorMessage.NewVisit(v));
                }

                // Visitor mailbox processor processes messages/visits on a separate thread. 
                // We must wait for the thread to finish processing before terminating the 
                // program.
                while (true) {
                    var l = visitor.CurrentQueueLength;
                    Console.WriteLine($"Queue length: {l}");
                    Thread.Sleep(5000);

                    if (l == 0) {
                        break;
                    }
                }

                Console.ReadKey();
            }
        }
    }
}

Whereas this code fulfills our need, it only scratches the surface on what's possible with ClrMD and extensions. With access to mostly the same data structures as WinDbg's SOS extensions, we could implement most of its functionality in C#. In fact, msos does exactly this. Microsoft tools such as DebugDiag and PerfView also use ClrMD under the hood.

Conclusion

With the WinDbg textual output and object graph in mind, the solution should be fairly easy to follow. Sharing configuration settings between the console and the deployed web application, the former behaves as the web application with respect to enqueuing and dequeuing the 423k visits.

In a way, we've inadvertently turned the memory dump into a message queue persistence mechanism.