Part 1: WinDbg: Recreating .NET objects from an Azure App Service memory dump
Part 2: ClrMD: Recreating .NET objects from an Azure App Service memory dump
This post outlines how to use WinDbg to identify and extract .NET objects from a memory dump of an Azure App Service, an Azure deployed web application.
Akin to Google Analytics, the memory dump is of an application which records page visits. It does so by exposing endpoints called by JavaScript on each visit to a web application. Due to a bug, the analytics application has been running for three weeks without the ability to persist visits to the database. Visits have accumulated in-process, and are lost on restart. The goal is to conduct a post-mortem analysis of the memory dump and extract visit objects for replay.
An Azure App Service supports memory dumping without terminating the process through Kudu. Furthermore, Kudu enables navigating the App Service's file system to download sos.dll and mscordacwks.dll. At this point the dump is ready to be loaded into WinDbg, like any other dump of a .NET process.
To know what to look for and expect inside the dump, a brief overview of the application is in order. The dump is of the Bugfree.Spo.Analytics application whose endpoints receive a JSON payload containing visit metadata on each page visit to another application.
On the server side, multiple producers and a single consumer operate on single, shared, in-process message queue. Once a visit arrives on a worker threads, it's validated, enriched, and turned into an enqueued .NET Visit object. When the queue reaches a certain length, a consumer dequeues the Visit objects and after a bit of processing writes the visits to a MS SQL Azure instance:
Visit 1 \ Visit 2 -\ .. -> Thread pool producers ... -/ | ... -/ Process and post ... -/ | Visit N / v Mailbox processor with queue and consumer | Process and save | v MS SQL Azure instance
In our case, some 423k Visit objects are stuck, waiting to be consumed.
At this point, we'll assume that the dump is ready to be loaded into WinDbg. From the output below, we see that WinDbg ships with appropriate versions of sos.dll and mscordacwks.dll, rather than loading the ones we supplied. Process uptime is reported at close to 25 days. From later analysis of visits we know that for about 21 of those the consumer hasn't been operational:
Microsoft (R) Windows Debugger Version 10.0.15063.137 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\AzureDump\Bugfree.Spo.Analytics.Cli-d3c510-07-25-13-08-00.dmp] User Mini Dump File with Full Memory: Only application data is available Symbol search path is: srv* Executable search path is: Windows 8 Version 9200 UP Free x86 compatible Product: Server, suite: TerminalServer DataCenter SingleUserTS 6.2.9200.16384 (win8_rtm.120725-1247) Machine Name: Debug session time: Tue Jul 25 15:07:22.000 2017 (UTC + 2:00) System Uptime: 28 days 13:46:50.588 Process Uptime: 24 days 20:40:30.000 ................................................................ ... Loading unloaded module list .. eax=00000000 ebx=0116e610 ecx=00000000 edx=00000000 esi=0116e3a0 edi=00000001 eip=7781081c esp=0116e278 ebp=0116e3f8 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 ntdll!NtWaitForMultipleObjects+0xc: 7781081c c21400 ret 14h 0:000> .sympath+ C:\AzureDump Symbol search path is: srv*;C:\AzureDump Expanded Symbol search path is: cache*;SRV*https://msdl.microsoft.com/download/symbols;c:\azuredump ************* Symbol Path validation summary ************** Response Time (ms) Location Deferred srv* OK C:\AzureDump 0:000> .cordll -ve -u -l CLRDLL: Unable to get version info for 'D:\Windows\Microsoft.NET\Framework\v4.0.30319\mscordacwks.dll', Win32 error 0n87 Automatically loaded SOS Extension CLRDLL: Loaded DLL C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\sym\mscordacwks_x86_x86_4.7.2053.00.dll\58FA6BB36e6000\mscordacwks_x86_x86_4.7.2053.00.dll CLR DLL status: Loaded DLL C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\sym\mscordacwks_x86_x86_4.7.2053.00.dll\58FA6BB36e6000\mscordacwks_x86_x86_4.7.2053.00.dll
First order of business is to locate Visit objects inside the dump. One way is filtering the heap for objects of the Visit type. The problem with this approach is that the heap may contain objects eligible for garbage collection, i.e., visits already written to the database. Instead we opt for browsing the application's source, looking for a unique object which, directly or indirectly, stores Visit objects.
Browsing the source, we observe that the producers/consumer mechanism is nicely encapsulated within the MailboxProcessor type. Searching the heap for instances of this type, a single instance shows up:
0:000> !DumpHeap -stat -type MailboxProcessor Statistics: MT Count TotalSize Class Name 08a2db5c 1 32 Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+LoggerMessage, Bugfree.Spo.Analytics.Cli]] 08a2d584 1 32 Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] Total 2 objects
From documentation and F# library source, it's evident that MailboxProcessor is aliased as FSharpMailboxProcessor. The "`1" is CLR notation for the arity of its generic type -- the number of type arguments -- representing the type of item queued. Inside the brackets is the concrete generic type and its assembly, with "+" being CLR notation for an inner class.
From a C# perspective, having a type called Agents with an inner type of VisitorMessage may seem like an odd design. In fact, it's an consequence of how the F# compiler maps language constructs to IL.
From the value of the MT (Method Table) column, we can locate objects of only that type on the managed heap. As the queueing mechanism is a singleton object, a static field, only one instance shows up:
0:000> !DumpHeap /d -mt 08a2d584 Address MT Size 0252c384 08a2d584 32
The address column points to the location of the object inside the process' 32-bit virtual address space. We could inspect it by dumping raw memory content, but SOS comes with a command to print an object:
0:000> !DumpObj /d 0252c384 Name: Microsoft.FSharp.Control.FSharpMailboxProcessor`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] MethodTable: 08a2d584 EEClass: 08a0ff30 Size: 32(0x20) bytes File: D:\home\site\wwwroot\FSharp.Core.dll Fields: MT Field Offset Type VT Attr Value Name 00000000 40001ff 4 0 instance 0252c378 initial 048a5570 4000200 18 ...CancellationToken 1 instance 0252c39c cancellationToken@2521 0855a350 4000201 8 ...Canon, mscorlib]] 0 instance 0252c3a4 mailbox 012dc8f0 4000202 10 System.Int32 1 instance -1 defaultTimeout 012da988 4000203 14 System.Boolean 1 instance 1 started 0855a0b8 4000204 c ...ption, mscorlib]] 0 instance 0252c3f4 errorEvent
The mailbox field looks promising. It's a generic reference type whose Type is oddly listed as Microsoft.FSharp.Control.Mailbox`1[[System.__Canon, mscorlib]]. System.__Canon is a CLR placeholder type related to how .NET generics is implemented under the hood and is replaced with an actual type at runtime as shown next:
0:000> !DumpObj /d 0252c3a4 Name: Microsoft.FSharp.Control.Mailbox`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] MethodTable: 0855bec8 EEClass: 08a3073c Size: 32(0x20) bytes File: D:\home\site\wwwroot\FSharp.Core.dll Fields: MT Field Offset Type VT Attr Value Name 05491110 40001f8 4 ...Canon, mscorlib]] 0 instance 00000000 inboxStore 0855b748 40001f9 8 ...Canon, mscorlib]] 0 instance 0252c3c4 arrivals 0855b748 40001fa c ...Canon, mscorlib]] 0 instance 0252c3c4 syncRoot 0855af04 40001fb 10 ...ore]], mscorlib]] 0 instance 00000000 savedCont 082472fc 40001fc 14 ...ng.AutoResetEvent 0 instance 00000000 pulse 0855b290 40001fd 18 ...olean, mscorlib]] 0 instance 0252c3e8 waitOneNoTimeout
From these fields, it's clear that Mailbox is where shared access to the queue is synchronized. Continuing our search for Visit objects, the arrivals field looks promising. Following the pointer once again, we end up at a Queue type defined in the F# standard library -- a thin wrapper around an array:
0:000> !DumpObj /d 0252c3c4 Name: Microsoft.FSharp.Control.Queue`1[[Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage, Bugfree.Spo.Analytics.Cli]] MethodTable: 0855bf60 EEClass: 08a310d4 Size: 24(0x18) bytes File: D:\home\site\wwwroot\FSharp.Core.dll Fields: MT Field Offset Type VT Attr Value Name 048a17a8 40001db 4 System.__Canon[] 0 instance 037371e0 array 012dc8f0 40001dc 8 System.Int32 1 instance 0 head 012dc8f0 40001dd c System.Int32 1 instance 422813 size 012dc8f0 40001de 10 System.Int32 1 instance 422813 tail
Besides an array of objects, the Queue appears to keep track of the index of the first and last element as well as its size. The array has a capacity of 524,288, but only contains 422,813 visits:
0:000> !DumpObj /d 037371e0 Name: Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage[] MethodTable: 0855bfcc EEClass: 012da164 Size: 2097164(0x20000c) bytes Array: Rank 1, Number of elements 524288, Type CLASS (Print Array) Fields: None
At first glance, we might have expected the array to store Visit objects directly, but that's not how a MailboxProcessor works. It supports switching on the type of each message. In C# terms, think of messages as an inheritance hierarchy with VisitorMessage as the abstract base type and each actual message type as a concrete subtype. Each type of message may carry additional state, such as the actual visit stored in a field inside the subtype.
To see this hierarchy at work, we dump the first element of the array. Its item field holds the Visit object:
0:000> !DumpArray -start 0 -length 1 /d 037371e0 Name: Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage[] MethodTable: 0855bfcc EEClass: 012da164 Size: 2097164(0x20000c) bytes Array: Rank 1, Number of elements 524288, Type CLASS Element Methodtable: 08a2d2ac [0] 02663808 0:000> !DumpObj /d 02663808 Name: Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage MethodTable: 08a2d2ac EEClass: 08a0fec8 Size: 12(0xc) bytes File: D:\home\site\wwwroot\Bugfree.Spo.Analytics.Cli.exe Fields: MT Field Offset Type VT Attr Value Name 08a2d938 40000b2 4 ....Cli.Domain+Visit 0 instance 026637d0 item 0:000> !DumpObj /d 026637d0 Name: Bugfree.Spo.Analytics.Cli.Domain+Visit MethodTable: 08a2d938 EEClass: 0852014c Size: 56(0x38) bytes File: D:\home\site\wwwroot\Bugfree.Spo.Analytics.Cli.exe Fields: MT Field Offset Type VT Attr Value Name 048aa9a4 40000fc 1c System.Guid 1 instance 026637ec CorrelationId@ 05547ab4 40000fd 2c System.DateTime 1 instance 026637fc Timestamp@ 012dfccc 40000fe 4 System.String 0 instance 02663530 LoginName@ 012dfccc 40000ff 8 System.String 0 instance 02663700 SiteCollectionUrl@ 012dfccc 4000100 c System.String 0 instance 026635b4 VisitUrl@ 054b6a94 4000101 10 ...Int32, mscorlib]] 0 instance 02663774 PageLoadTime@ 0641c510 4000102 14 System.Net.IPAddress 0 instance 02663780 IP@ 054b5248 4000103 18 ...tring, mscorlib]] 0 instance 026637c4 UserAgent@
The "@" sign appended to each name denotes a property backing field. Thus, for every Visit object in the array, to recreate it from memory, we must dump the values of each backing field. And for any non-simple type of backing field, we must recursively dump it until we arrive at simple types.
Tracking pointers and dumping objects with WinDbg should make it clear that we're traversing a (potentially cyclic) graph of objects. In this case the objects form a tree, rooted in the singleton MailboxProcessor instance:
Bugfree.Spo.Analytics.Cli.Agents+visitor (Microsoft.FSharp.Control.FSharpMailboxProcessor) mailbox (Microsoft.FSharp.Control.Mailbox) arrivals (Microsoft.FSharp.Control.Queue) array (Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage[]) visit1 (Bugfree.Spo.Analytics.Cli.Agents+VisitorMessage) item (Bugfree.Spo.Analytics.Cli.Domain+Visit) CorrelationId (System.Guid) _a (System.Int32) _b (System.Int16) _c (System.Int16) _d (System.Byte) ... _k (System.Byte) Timestamp dateDate (System.UInt64) ... LoginName (System.String) SiteCollectionUrl (System.String) VisitUrl (System.String) PageLoadTime (Microsoft.FSharp.Core.FSharpOption) value (System.Int32) IP (System.Net.IPAddress) m_Address (System.Int64) ... UserAgent (Microsoft.FSharp.Core.FSharpOption) value (System.String) visit2 ... visitN
Each Visit object inside the queue is prevented from being garbage collected because it's indirectly rooted by the static MailboxProcessor field. Incidentally, the number of Visit objects in the array matches the number of Visit objects on the heap:
0:000> !DumpHeap -stat -type Bugfree.Spo.Analytics.Cli.Domain+Visit Statistics: MT Count TotalSize Class Name 08a62f2c 1 16 Microsoft.FSharp.Collections.FSharpList`1[[Bugfree.Spo.Analytics.Cli.Domain+Visit, Bugfree.Spo.Analytics.Cli]] 08a2d938 422813 23677528 Bugfree.Spo.Analytics.Cli.Domain+Visit Total 422814 objects
This implies that rather than traversing the tree, dumping Visit objects directly is simpler and yields the same result.
While WinDbg provides for easy graph exploration, it can only extract and pretty print simple .NET types such as String, Int, and Float. Compound types, such as Guid, FSharpOption, IPAddress, and DateTime require parsing of its text output. We'd have to recursively traverse each compound type inside each one of the 422,813 Visit objects, parsing command output.
Generating the batch of WinDbg commands to run against each visit and parsing all that command output is a lot of work. As an alternative, next we'll explore the Microsoft Diagnostics Runtime, or ClrMD for short, a .NET dump file API, alleviating the need for WinDbg scripting and parsing.