Part 1: WinDbg: Recreating .NET objects from an Azure App Service memory dump
Part 2: ClrMD: Recreating .NET objects from an Azure App Service memory dump
This post outlines how to use WinDbg to extract in-memory .NET objects from a memory dump of an Azure App Service, an Azure deployed web application.
An Azure App Service supports memory dumping without terminating the process through its associated Kudu instance. Again, using Kudu, we download sos.dll and mscordacwks.dll from the Azure site. At this point WinDbg can load the dump and we can continue working with it like any other dump of a .NET process.
To know what to look for and expect inside the dump, a brief overview of the application is in order. The dump is of the Bugfree.Spo.Analytics application whose endpoints receive a JSON payload with visit metadata on a page visit in another application.
On the server side, multiple producers and a single consumer operate on single, shared, in-process message queue. Once a request comes in on one of the worker threads, it's validated, enriched, and turned into a .NET Visit object which is enqueued. When the queue reaches a certain length, a consumer dequeues the Visit objects and after a bit of processing writes those visits to a MS SQL Azure instance:
Request 1 \ Request 2 -\ ... -> Thread pool producers ... -/ | ... -/ Process and post ... -/ | Request N / v Mailbox processor with queue and consumer | Process and save | v MS SQL Azure instance
In our case, some 423k Visit objects are queued up, ready to be consumed. While visit data isn't business critical, per se, a three week gap should be avoided if it isn't too time-consuming.
At this point, we're going to assume that the dump is ready to be loaded. From the output below we see that WinDbg ships with versions of sos.dll and mscordacwks.dll, but that might not always be the case. Process uptime is reported as close to 25 days. From later analysis we know that for about 21 of those the consumer hasn't processed Visits:
Microsoft (R) Windows Debugger Version 10.0.15063.137 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\AzureDump\Bugfree.Spo.Analytics.Cli-d3c510-07-25-13-08-00.dmp] User Mini Dump File with Full Memory: Only application data is available Symbol search path is: srv* Executable search path is: Windows 8 Version 9200 UP Free x86 compatible Product: Server, suite: TerminalServer DataCenter SingleUserTS 6.2.9200.16384 (win8_rtm.120725-1247) Machine Name: Debug session time: Tue Jul 25 15:07:22.000 2017 (UTC + 2:00) System Uptime: 28 days 13:46:50.588 Process Uptime: 24 days 20:40:30.000 ................................................................ ... Loading unloaded module list .. eax=00000000 ebx=0116e610 ecx=00000000 edx=00000000 esi=0116e3a0 edi=00000001 eip=7781081c esp=0116e278 ebp=0116e3f8 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 ntdll!NtWaitForMultipleObjects+0xc: 7781081c c21400 ret 14h 0:000> .sympath+ C:\AzureDump Symbol search path is: srv*;C:\AzureDump Expanded Symbol search path is: cache*;SRV*https://msdl.microsoft.com/download/symbols;c:\azuredump ************* Symbol Path validation summary ************** Response Time (ms) Location Deferred srv* OK C:\AzureDump 0:000> .cordll -ve -u -l CLRDLL: Unable to get version info for 'D:\Windows\Microsoft.NET\Framework\v4.0.30319\mscordacwks.dll', Win32 error 0n87 Automatically loaded SOS Extension CLRDLL: Loaded DLL C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\sym\mscordacwks_x86_x86_4.7.2053.00.dll\58FA6BB36e6000\mscordacwks_x86_x86_4.7.2053.00.dll CLR DLL status: Loaded DLL C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\sym\mscordacwks_x86_x86_4.7.2053.00.dll\58FA6BB36e6000\mscordacwks_x86_x86_4.7.2053.00.dll
There's a couple of ways to go about locating Visit objects inside the dump. One way is filtering the heap for objects of the Visit type. The problem with this approach is that the heap may contain objects eligible for garbage collection, i.e., visits which have already been written to the database. While the application is built to handle duplicate visits, we better be specific if possible. Another way is browsing the application's source, looking for an object that, directly or indirectly, stores Visit objects.
Browsing the application's source, we observe that the producers/consumer mechanism is nicely encapsulated in the MailboxProcessor type. Searching the heap for instances of this object, a single instance shows up:
0:000> !DumpHeap -stat -type MailboxProcessor Statistics: MT Count TotalSize Class Name 08a2db5c 1 32 Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+LoggerMessage, Bugfree.Spo.Analytics.Cli]] 08a2d584 1 32 Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] Total 2 objects
From documentation and F# library source, it's clear that the type is compiled as FSharpMailboxProcessor and that it's generic in the type of object it stores. The "`1" part is CLR notation for arity of the generic type -- the number of it's type arguments -- whose type and assembly is provided in brackets. The "+" in the type name is CLR notation for an inner class. From a C# perspective, having a type called Agents with an inner class of VisitorMessage may seem odd. It's an artifact of how the F# compiler maps language constructs to IL and not how the code was actually written.
From the value of the MT (Method Table) column, we can locate objects of that type on the managed heap. As the queueing mechanism is a singleton object, a static field actually, only a single instance shows up:
0:000> !DumpHeap /d -mt 08a2d584 Address MT Size 0252c384 08a2d584 32
The address column points to the location of the object within the process' 32-bit virtual address space. We could inspect the object's dumping raw memory, but SOS comes with a command for that purpose:
0:000> !DumpObj /d 0252c384 Name: Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] MethodTable: 08a2d584 EEClass: 08a0ff30 Size: 32(0x20) bytes File: D:\home\site\wwwroot\FSharp.Core.dll Fields: MT Field Offset Type VT Attr Value Name 00000000 40001ff 4 0 instance 0252c378 initial 048a5570 4000200 18 ...CancellationToken 1 instance 0252c39c cancellationToken@2521 0855a350 4000201 8 ...Canon, mscorlib]] 0 instance 0252c3a4 mailbox 012dc8f0 4000202 10 System.Int32 1 instance -1 defaultTimeout 012da988 4000203 14 System.Boolean 1 instance 1 started 0855a0b8 4000204 c ...ption, mscorlib]] 0 instance 0252c3f4 errorEvent
The mailbox field looks promising. Because the field is a reference type inside another type, Value is the memory location of the object. And because the type of the mailbox field is generic, its type is listed as System.__Canon. This type is a CLR placeholder related to how .NET generics is implemented under the hood. Suffice it to say that System.__Canon is substituted for an actual reference type at runtime as seen from the Name below:
0:000> !DumpObj /d 0252c3a4 Name: Microsoft.FSharp.Control.Mailbox`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] MethodTable: 0855bec8 EEClass: 08a3073c Size: 32(0x20) bytes File: D:\home\site\wwwroot\FSharp.Core.dll Fields: MT Field Offset Type VT Attr Value Name 05491110 40001f8 4 ...Canon, mscorlib]] 0 instance 00000000 inboxStore 0855b748 40001f9 8 ...Canon, mscorlib]] 0 instance 0252c3c4 arrivals 0855b748 40001fa c ...Canon, mscorlib]] 0 instance 0252c3c4 syncRoot 0855af04 40001fb 10 ...ore]], mscorlib]] 0 instance 00000000 savedCont 082472fc 40001fc 14 ...ng.AutoResetEvent 0 instance 00000000 pulse 0855b290 40001fd 18 ...olean, mscorlib]] 0 instance 0252c3e8 waitOneNoTimeout
From these fields, it's clear that Mailbox is where queue access synchronization happens. Continuing our search for Visit objects, the arrivals field looks promising. Once again following the pointer, we end up at a Queue type defined in the F# standard library -- a thin wrapper around an array:
0:000> !DumpObj /d 0252c3c4 Name: Microsoft.FSharp.Control.Queue`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] MethodTable: 0855bf60 EEClass: 08a310d4 Size: 24(0x18) bytes File: D:\home\site\wwwroot\FSharp.Core.dll Fields: MT Field Offset Type VT Attr Value Name 048a17a8 40001db 4 System.__Canon 0 instance 037371e0 array 012dc8f0 40001dc 8 System.Int32 1 instance 0 head 012dc8f0 40001dd c System.Int32 1 instance 422813 size 012dc8f0 40001de 10 System.Int32 1 instance 422813 tail
Besides an array of objects, the Queue appears to keep track of the index of the first and last element in queue as well as its size. The array has a current capacity of 524,288 items, but we only use 422,813:
0:000> !DumpObj /d 037371e0 Name: Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage MethodTable: 0855bfcc EEClass: 012da164 Size: 2097164(0x20000c) bytes Array: Rank 1, Number of elements 524288, Type CLASS (Print Array) Fields: None
As first sight, we might have expected the array to store Visit object, but that's not how a MailboxProcessor works. It supports switching on the type of each message. In C# terms, think of it as an inheritance hierarchy with VisitorMessage as the abstract base type and each message type as a concrete subtype. In addition, each type of message may carry addition state, such as the actual visit.
To see this hierarchy in action, we can dump the first element of the array. Its item field holds as additional state a Visit object:
0:000> !DumpArray -start 0 -length 1 /d 037371e0 Name: Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage MethodTable: 0855bfcc EEClass: 012da164 Size: 2097164(0x20000c) bytes Array: Rank 1, Number of elements 524288, Type CLASS Element Methodtable: 08a2d2ac  02663808 0:000> !DumpObj /d 02663808 Name: Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage MethodTable: 08a2d2ac EEClass: 08a0fec8 Size: 12(0xc) bytes File: D:\home\site\wwwroot\Bugfree.Spo.Analytics.Cli.exe Fields: MT Field Offset Type VT Attr Value Name 08a2d938 40000b2 4 ....Cli.Domain+Visit 0 instance 026637d0 item 0:000> !DumpObj /d 026637d0 Name: Bugfree.Spo.Analytics.Cli.Domain+Visit MethodTable: 08a2d938 EEClass: 0852014c Size: 56(0x38) bytes File: D:\home\site\wwwroot\Bugfree.Spo.Analytics.Cli.exe Fields: MT Field Offset Type VT Attr Value Name 048aa9a4 40000fc 1c System.Guid 1 instance 026637ec CorrelationId@ 05547ab4 40000fd 2c System.DateTime 1 instance 026637fc Timestamp@ 012dfccc 40000fe 4 System.String 0 instance 02663530 LoginName@ 012dfccc 40000ff 8 System.String 0 instance 02663700 SiteCollectionUrl@ 012dfccc 4000100 c System.String 0 instance 026635b4 VisitUrl@ 054b6a94 4000101 10 ...Int32, mscorlib]] 0 instance 02663774 PageLoadTime@ 0641c510 4000102 14 System.Net.IPAddress 0 instance 02663780 IP@ 054b5248 4000103 18 ...tring, mscorlib]] 0 instance 026637c4 UserAgent@
The "@" in the name denotes a property backing field. For every Visit object in the array, to recreate the object from memory, we must dump the values of each backing field. And for any non-simple type of backing field, we must recursively dump it until we arrive at simple types.
Following pointers and dumping objects with WinDbg should make it very clear that we're traversing a (potentially cyclic) graph of objects. In this case the objects form a tree, rooted in a singleton MailboxProcessor instance:
Bugfree.Spo.Analytics.Cli.Agents+visitor (Microsoft.FSharp.Control.FSharpMailboxProcessor) mailbox (Microsoft.FSharp.Control.Mailbox) arrivals (Microsoft.FSharp.Control.Queue) array (Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage) message1 (Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage) item (Bugfree.Spo.Analytics.Cli.Domain+Visit) CorrelationId (System.Guid) _a (System.Int32) _b (System.Int16) _c (System.Int16) _d (System.Byte) ... _k (System.Byte) Timestamp dateDate (System.UInt64) ... LoginName (System.String) SiteCollectionUrl (System.String) VisitUrl (System.String) PageLoadTime (Microsoft.FSharp.Core.FSharpOption) value (System.Int32) IP (System.Net.IPAddress) m_Address (System.Int64) ... UserAgent (Microsoft.FSharp.Core.FSharpOption) value (System.String) message2 ... messageN
The Visit object is kept alive because it's indirectly rooted by the static field. It prevents the garbage collector from collecting Visit objects. Incidentally, the number of Visit objects in the array matches the number of Visit objects on the heap:
0:000> !DumpHeap -stat -type Bugfree.Spo.Analytics.Cli.Domain+Visit Statistics: MT Count TotalSize Class Name 08a62f2c 1 16 Microsoft.FSharp.Collections.FSharpList`1[[Bugfree.Spo.Analytics.Cli.Domain+Visit, Bugfree.Spo.Analytics.Cli]] 08a2d938 422813 23677528 Bugfree.Spo.Analytics.Cli.Domain+Visit Total 422814 objects
Thus, rather than traversing the tree, searching for and dumping Visit objects directly is simpler and yields the same result.
While WinDbg provides for easy graph traversal, it only knows how to extract and pretty print simple .NET types such as String, Int, and Float. For compound types, such as Guid, FSharpOption, IPAddress, and DateTime, turning text output into .NET objects is a lot of work. We'd have to recursively traverse each compound type inside every one of 422,813 Visit object, parsing text and substituting addresses.