WinDbg: Recreating .NET objects from an Azure App Service memory dump

06 Aug 2017

Part 1: WinDbg: Recreating .NET objects from an Azure App Service memory dump
Part 2: ClrMD: Recreating .NET objects from an Azure App Service memory dump

This post outlines how to use WinDbg to extract in-memory .NET objects from a memory dump of an Azure App Service, an Azure deployed web application.

Akin to Google Analytics, the application records page visits by exposing endpoints called by JavaScript on each page of another site. For some reason the application has been running for three weeks without persisting visits to the database. Because visits are stored in-process, restarting the application will cause visits to be lost. Hence the goal is to analyze the memory dump port-mortem, extract objects describing visits, and replay those visits.

Memory dumping an Azure App Service

An Azure App Service supports memory dumping without terminating the process through its associated Kudu instance. Again, using Kudu, we download sos.dll and mscordacwks.dll from the Azure site. At this point WinDbg can load the dump and we can continue working with it like any other dump of a .NET process.

Architectural overview of the dumped process

To know what to look for and expect inside the dump, a brief overview of the application is in order. The dump is of the Bugfree.Spo.Analytics application whose endpoints receive a JSON payload with visit metadata on a page visit in another application.

On the server side, multiple producers and a single consumer operate on single, shared, in-process message queue. Once a request comes in on one of the worker threads, it's validated, enriched, and turned into a .NET Visit object which is enqueued. When the queue reaches a certain length, a consumer dequeues the Visit objects and after a bit of processing writes those visits to a MS SQL Azure instance:

  Request 1   \
  Request 2   -\
  ...          -> Thread pool producers
  ...         -/              |
  ...        -/       Process and post
  ...       -/                |
  Request N /	              v
                  Mailbox processor with queue and consumer
                              |
                      Process and save
                              |
                              v
                  MS SQL Azure instance

In our case, some 423k Visit objects are queued up, ready to be consumed. While visit data isn't business critical, per se, a three week gap should be avoided if it isn't too time-consuming.

Locating Visit objects inside the mailbox processor

At this point, we're going to assume that the dump is ready to be loaded. From the output below we see that WinDbg ships with versions of sos.dll and mscordacwks.dll, but that might not always be the case. Process uptime is reported as close to 25 days. From later analysis we know that for about 21 of those the consumer hasn't processed Visits:

Microsoft (R) Windows Debugger Version 10.0.15063.137 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\AzureDump\Bugfree.Spo.Analytics.Cli-d3c510-07-25-13-08-00.dmp]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*
Executable search path is: 
Windows 8 Version 9200 UP Free x86 compatible
Product: Server, suite: TerminalServer DataCenter SingleUserTS
6.2.9200.16384 (win8_rtm.120725-1247)
Machine Name:
Debug session time: Tue Jul 25 15:07:22.000 2017 (UTC + 2:00)
System Uptime: 28 days 13:46:50.588
Process Uptime: 24 days 20:40:30.000
................................................................
...
Loading unloaded module list
..
eax=00000000 ebx=0116e610 ecx=00000000 edx=00000000 esi=0116e3a0 edi=00000001
eip=7781081c esp=0116e278 ebp=0116e3f8 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ntdll!NtWaitForMultipleObjects+0xc:
7781081c c21400          ret     14h

0:000> .sympath+ C:\AzureDump
Symbol search path is: srv*;C:\AzureDump
Expanded Symbol search path is: cache*;SRV*https://msdl.microsoft.com/download/symbols;c:\azuredump

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*
OK                                             C:\AzureDump

0:000> .cordll -ve -u -l
CLRDLL: Unable to get version info for 'D:\Windows\Microsoft.NET\Framework\v4.0.30319\mscordacwks.dll', Win32 error 0n87
Automatically loaded SOS Extension
CLRDLL: Loaded DLL C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\sym\mscordacwks_x86_x86_4.7.2053.00.dll\58FA6BB36e6000\mscordacwks_x86_x86_4.7.2053.00.dll
CLR DLL status: Loaded DLL C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\sym\mscordacwks_x86_x86_4.7.2053.00.dll\58FA6BB36e6000\mscordacwks_x86_x86_4.7.2053.00.dll

There's a couple of ways to go about locating Visit objects inside the dump. One way is filtering the heap for objects of the Visit type. The problem with this approach is that the heap may contain objects eligible for garbage collection, i.e., visits which have already been written to the database. While the application is built to handle duplicate visits, we better be specific if possible. Another way is browsing the application's source, looking for an object that, directly or indirectly, stores Visit objects.

Browsing the application's source, we observe that the producers/consumer mechanism is nicely encapsulated in the MailboxProcessor type. Searching the heap for instances of this object, a single instance shows up:

0:000> !DumpHeap -stat -type MailboxProcessor
Statistics:
      MT    Count    TotalSize Class Name
08a2db5c        1           32 Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+LoggerMessage, Bugfree.Spo.Analytics.Cli]]
08a2d584        1           32 Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]]
Total 2 objects

From documentation and F# library source, it's clear that the type is compiled as FSharpMailboxProcessor and that it's generic in the type of object it stores. The "`1" part is CLR notation for arity of the generic type -- the number of it's type arguments -- whose type and assembly is provided in brackets. The "+" in the type name is CLR notation for an inner class. From a C# perspective, having a type called Agents with an inner class of VisitorMessage may seem odd. It's an artifact of how the F# compiler maps language constructs to IL and not how the code was actually written.

From the value of the MT (Method Table) column, we can locate objects of that type on the managed heap. As the queueing mechanism is a singleton object, a static field actually, only a single instance shows up:

0:000> !DumpHeap /d -mt 08a2d584
 Address       MT     Size
0252c384 08a2d584       32     

The address column points to the location of the object within the process' 32-bit virtual address space. We could inspect the object's dumping raw memory, but SOS comes with a command for that purpose:

0:000> !DumpObj /d 0252c384
Name:        Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]]
MethodTable: 08a2d584
EEClass:     08a0ff30
Size:        32(0x20) bytes
File:        D:\home\site\wwwroot\FSharp.Core.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
00000000  40001ff        4                       0 instance 0252c378 initial
048a5570  4000200       18 ...CancellationToken  1 instance 0252c39c cancellationToken@2521
0855a350  4000201        8 ...Canon, mscorlib]]  0 instance 0252c3a4 mailbox
012dc8f0  4000202       10         System.Int32  1 instance       -1 defaultTimeout
012da988  4000203       14       System.Boolean  1 instance        1 started
0855a0b8  4000204        c ...ption, mscorlib]]  0 instance 0252c3f4 errorEvent

The mailbox field looks promising. Because the field is a reference type inside another type, Value is the memory location of the object. And because the type of the mailbox field is generic, its type is listed as System.__Canon. This type is a CLR placeholder related to how .NET generics is implemented under the hood. Suffice it to say that System.__Canon is substituted for an actual reference type at runtime as seen from the Name below:

0:000> !DumpObj /d 0252c3a4
Name:        Microsoft.FSharp.Control.Mailbox`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]]
MethodTable: 0855bec8
EEClass:     08a3073c
Size:        32(0x20) bytes
File:        D:\home\site\wwwroot\FSharp.Core.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
05491110  40001f8        4 ...Canon, mscorlib]]  0 instance 00000000 inboxStore
0855b748  40001f9        8 ...Canon, mscorlib]]  0 instance 0252c3c4 arrivals
0855b748  40001fa        c ...Canon, mscorlib]]  0 instance 0252c3c4 syncRoot
0855af04  40001fb       10 ...ore]], mscorlib]]  0 instance 00000000 savedCont
082472fc  40001fc       14 ...ng.AutoResetEvent  0 instance 00000000 pulse
0855b290  40001fd       18 ...olean, mscorlib]]  0 instance 0252c3e8 waitOneNoTimeout

From these fields, it's clear that Mailbox is where queue access synchronization happens. Continuing our search for Visit objects, the arrivals field looks promising. Once again following the pointer, we end up at a Queue type defined in the F# standard library -- a thin wrapper around an array:

0:000> !DumpObj /d 0252c3c4
Name:        Microsoft.FSharp.Control.Queue`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]]
MethodTable: 0855bf60
EEClass:     08a310d4
Size:        24(0x18) bytes
File:        D:\home\site\wwwroot\FSharp.Core.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
048a17a8  40001db        4     System.__Canon[]  0 instance 037371e0 array
012dc8f0  40001dc        8         System.Int32  1 instance        0 head
012dc8f0  40001dd        c         System.Int32  1 instance   422813 size
012dc8f0  40001de       10         System.Int32  1 instance   422813 tail

Besides an array of objects, the Queue appears to keep track of the index of the first and last element in queue as well as its size. The array has a current capacity of 524,288 items, but we only use 422,813:

0:000> !DumpObj /d 037371e0
Name:        Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage[]
MethodTable: 0855bfcc
EEClass:     012da164
Size:        2097164(0x20000c) bytes
Array:       Rank 1, Number of elements 524288, Type CLASS (Print Array)
Fields:
None

As first sight, we might have expected the array to store Visit object, but that's not how a MailboxProcessor works. It supports switching on the type of each message. In C# terms, think of it as an inheritance hierarchy with VisitorMessage as the abstract base type and each message type as a concrete subtype. In addition, each type of message may carry addition state, such as the actual visit.

To see this hierarchy in action, we can dump the first element of the array. Its item field holds as additional state a Visit object:

0:000> !DumpArray -start 0 -length 1 /d 037371e0
Name:        Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage[]
MethodTable: 0855bfcc
EEClass:     012da164
Size:        2097164(0x20000c) bytes
Array:       Rank 1, Number of elements 524288, Type CLASS
Element Methodtable: 08a2d2ac
[0] 02663808

0:000> !DumpObj /d 02663808
Name:        Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage
MethodTable: 08a2d2ac
EEClass:     08a0fec8
Size:        12(0xc) bytes
File:        D:\home\site\wwwroot\Bugfree.Spo.Analytics.Cli.exe
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
08a2d938  40000b2        4 ....Cli.Domain+Visit  0 instance 026637d0 item

0:000> !DumpObj /d 026637d0
Name:        Bugfree.Spo.Analytics.Cli.Domain+Visit
MethodTable: 08a2d938
EEClass:     0852014c
Size:        56(0x38) bytes
File:        D:\home\site\wwwroot\Bugfree.Spo.Analytics.Cli.exe
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
048aa9a4  40000fc       1c          System.Guid  1 instance 026637ec CorrelationId@
05547ab4  40000fd       2c      System.DateTime  1 instance 026637fc Timestamp@
012dfccc  40000fe        4        System.String  0 instance 02663530 LoginName@
012dfccc  40000ff        8        System.String  0 instance 02663700 SiteCollectionUrl@
012dfccc  4000100        c        System.String  0 instance 026635b4 VisitUrl@
054b6a94  4000101       10 ...Int32, mscorlib]]  0 instance 02663774 PageLoadTime@
0641c510  4000102       14 System.Net.IPAddress  0 instance 02663780 IP@
054b5248  4000103       18 ...tring, mscorlib]]  0 instance 026637c4 UserAgent@

The "@" in the name denotes a property backing field. For every Visit object in the array, to recreate the object from memory, we must dump the values of each backing field. And for any non-simple type of backing field, we must recursively dump it until we arrive at simple types.

Following pointers and dumping objects with WinDbg should make it very clear that we're traversing a (potentially cyclic) graph of objects. In this case the objects form a tree, rooted in a singleton MailboxProcessor instance:

Bugfree.Spo.Analytics.Cli.Agents+visitor (Microsoft.FSharp.Control.FSharpMailboxProcessor)
  mailbox (Microsoft.FSharp.Control.Mailbox)
    arrivals (Microsoft.FSharp.Control.Queue)
      array (Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage[])
        message1 (Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage)
          item (Bugfree.Spo.Analytics.Cli.Domain+Visit)
            CorrelationId (System.Guid)
              _a (System.Int32)
              _b (System.Int16)
              _c (System.Int16)
              _d (System.Byte)
              ...
              _k (System.Byte)
            Timestamp
              dateDate (System.UInt64)
              ...
            LoginName (System.String)
            SiteCollectionUrl (System.String)
            VisitUrl (System.String)
            PageLoadTime (Microsoft.FSharp.Core.FSharpOption)
              value (System.Int32)
            IP (System.Net.IPAddress)
              m_Address (System.Int64)
              ...
            UserAgent (Microsoft.FSharp.Core.FSharpOption)
              value (System.String)
        message2
        ...
        messageN

The Visit object is kept alive because it's indirectly rooted by the static field. It prevents the garbage collector from collecting Visit objects. Incidentally, the number of Visit objects in the array matches the number of Visit objects on the heap:

0:000> !DumpHeap -stat -type Bugfree.Spo.Analytics.Cli.Domain+Visit
Statistics:
      MT    Count    TotalSize Class Name
08a62f2c        1           16 Microsoft.FSharp.Collections.FSharpList`1[[Bugfree.Spo.Analytics.Cli.Domain+Visit, Bugfree.Spo.Analytics.Cli]]
08a2d938   422813     23677528 Bugfree.Spo.Analytics.Cli.Domain+Visit
Total 422814 objects

Thus, rather than traversing the tree, searching for and dumping Visit objects directly is simpler and yields the same result.

Conclusion

While WinDbg provides for easy graph traversal, it only knows how to extract and pretty print simple .NET types such as String, Int, and Float. For compound types, such as Guid, FSharpOption, IPAddress, and DateTime, turning text output into .NET objects is a lot of work. We'd have to recursively traverse each compound type inside every one of 422,813 Visit object, parsing text and substituting addresses.

Using the Microsoft Diagnostics Runtime, ClrMD for short, next we'll automate heap traversal, extraction, and recreation of visits in a much simpler manner.

Have a comment or question? Please drop me an email or tweet to @ronnieholm.