WinDbg postmortem debugging across installed .NET CLR versions

24 Oct 2016

This post outlines the steps involved from generating a memory dump of a process, running some version of the CLR on one machine, to loading the dump into WinDbg running on another machine. Inside WinDbg, we want the SOS debugger extension loaded to interrogate the dump. Specifically, SOS should load even if the WinDbg machine doesn't have the version of the CLR installed that the machine running the dumped process has.

Step 1: Generate process dump

To generate an exemplar dump, navigate to Windows Task Manager, right click on w3wp.exe which we know is running .NET code, and select Create dump file. For the cases where we want to generate dumps based on triggers like CPU or memory utilization exceeding some threshold, or when a process throws a certain type of exception, ADPlus, DebugDiag, and ProcDump provide better methods for dump capture.

In preparation for copying the dump to the WinDbg machine, we must determine the CLR version running inside the w3wp process and its bitness. If in doubt about the CLR version, we can always check IIS application pool settings, which in this case shows .NET Framework v2.0.50727. As for the bitness, Task Manager shows w3wp without the *32 suffix, indicating a 64 bit process. As an alternative, we could've used Sysinternals Process Explorer and looked for DLLs loaded into the process coming from one of the C:\Windows\Microsoft.NET subfolders.

Inside C:\Windows\Microsoft.NET\Framework64, each x64 version of .NET appears as a subfolder (v2.0.50727, v4.0.30319, and so on). What's missing from the folder name versioning scheme is the patch version. Looking at the properties of one of the DLLs inside the v2.0.50727 folder, we see the exact version of the CLR running inside the w3wp process is actually 2.0.50727.4253.

Armed with the memory dump and knowledge about version and bitness of the CLR running managed code inside the dumped process, we're ready to copy the dump and auxiliary files to the WinDbg machine.

Step 2: Copy dump and .NET CLR DLLs to other machine

For WinDbg to be able to load and inspect the dump, auxiliary files from the CLR running inside w3wp must be available it. These files, matching patch version and bitness as determined above, may already exist on the WinDbg machine. But rather than assuming it, we copy the required two files from the .NET Framework folder of the w3wp machine, in addition to the dump file, to the WinDbg machine:

  • mscordacwks.dll: the Microsoft Common Object Runtime Data Access Component for workstations (COR was an early name for the CLR) exposes APIs through which WinDbg can access memory, and thereby CLR data structures, of the dumpted process. The component is actually compiled from the same source code as the CLR executing in-process. During postmortem debugging it acts as a stand-in for the CLR, frozen in time inside the dump. Inside a running process, querying CLR data structures implies not only reading memory structures, but also executing native code to interpret those structures. Running outside the debugged process, the data access component serves this same purpose.

  • sos.dll: short for Son of Strike, this library contains WinDbg .NET extension commands. By utilizing mscordacwks.dll (and by natively inspecting the dump), these SOS commands query and interpret CLR data structures and surface those in a digestible format. Without mscordacwks.dll and sos.dll, we'd be looking at CLR data structures as they're layed out in memory.

Because CLR internals, and by implication the SOS commands, are subject to change with new runtime versions, specific versions of mscordacwks.dll and sos.dll ship with each runtime. Thus, collect mscordacwks.dll and sos.dll from the .NET framework folder and together with w3wp.dmp copy these files to the C:\debug\w3wp-sp2007 folder on the WinDbg machine.

Step 3: First attempt at loading w3wp.dmp into WinDbg

Open WinDbg (X64) and go to the File menu, Open Crash Dump... and locate C:\debug\w3wp-sp2007\w3wp.dmp. In response, WinDbg prints the following output:

Microsoft (R) Windows Debugger Version 10.0.14321.1024 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Debug\w3wp-sp2007\w3wp.DMP]
User Mini Dump File with Full Memory: Only application data is available

Symbol search path is: srv*
Executable search path is: 
Windows Server 2008/Windows Vista Version 6002 (Service Pack 2) MP (16 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Machine Name:
Debug session time: Fri Oct 14 15:42:20.000 2016 (UTC + 2:00)
System Uptime: 128 days 10:41:39.295
Process Uptime: 0 days 14:21:17.000
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
.
Loading unloaded module list
................
ntdll!NtWaitForSingleObject+0xa:
00000000`775d68da c3              ret

Next we issue commands to set the symbol path to the public Microsoft symbol server and reload symbols for modules loaded into w3wp. Then, based on the folder location (the version) of mscorwks.dll loaded into w3wp, we load SOS from the same path. The .loadby command is actually a shortcut for the general .load command, accepting a path of a DLL to load. .loadby got introduced because .NET DLLs tend to have a lengthy path, and so the shortcut resolves the path based on the location of the loaded mscorwks.dll. On CLR 4.0 and above, the command is .loadby sos clr because the CLR is now inside clr.dll:

0:000> .symfix
0:000> .reload
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
.
Loading unloaded module list
................
0:000> .loadby sos mscorwks

As an aside, recent versions of WinDbg run .symfix on startup, rendering the .symfix and .reload commands from above redundant.

Judging from the absence of any error message, it seems that SOS got loaded. But that isn't the case as we'll see in a moment.

Step 4: Second attempt at loading w3wp.dmp into WinDbg

Even though we see no error messages, issuing the !CLRStack command, which is part of SOS (and other SOS commands for that matter), yields the following error message:

0:000> !CLRStack
Failed to load data access DLL, 0x80004005
Verify that 1) you have a recent build of the debugger (6.2.14 or newer)
            2) the file mscordacwks.dll that matches your version of mscorwks.dll is 
                in the version directory
            3) or, if you are debugging a dump file, verify that the file 
                mscordacwks_<arch>_<arch>_<version>.dll is on your symbol path.
            4) you are debugging on the same architecture as the dump file.
                For example, an IA64 dump file must be debugged on an IA64
                machine.

You can also run the debugger command .cordll to control the
debugger's load of mscordacwks.dll. .cordll -ve -u -l will do a
verbose reload. If that succeeds, the SOS command should work on
retry.

If you are debugging a minidump, you need to make sure that your executable
path is pointing to mscorwks.dll as well.

As an aside, notice the HRESULT: 0x80004005 above. WinDbg comes with a command for converting an HRESULT error number into text: !error 80004005 resolves to Error code: (HRESULT) 0x80004005 (2147500037) - Unspecified error.

Let's address the suggested verification steps one by one:

  1. Isn't relevant as we're running WinDbg 10.0.14321.1024.
  2. WinDbg looked in C:\Windows\Microsoft.NET\Framework64\v2.0.50727, but didn't find a version of mscordacwks.dll matching the CLR version inside the dump. Remember that the w3wp process was executing under v2.0.50727.4253, but the CLR version inside the v2.0.50727 folder on the WinDbg machine turns out to be v2.0.50727.8009.
  3. Tells how to fix the CLR version mismatch issue by renaming C:\Debug\w3wp-sp2007\mscordacwks.dll to mscordacwks_AMD64_AMD64_2.0.50727.4253.dll and adding C:\Debug\w3wp-sp2007 to the symbol path.
  4. Isn't an issue because the architecture of the machines running WinDbg and wp3wp are both AMD64. If unsure about the architecture, have a look at the PROCESSOR_ARCHITECTURE environment variable, which on both machines have the value of AMD64.

As per suggestion (3), the .sympath command can both show the current symbol paths and append a path to it:

0:000> .sympath+ C:\Debug\w3wp-sp2007
Symbol search path is: srv*;C:\Debug\w3wp-sp2007
Expanded Symbol search path is: cache*;SRV*https://msdl.microsoft.com/download/symbols;c:\debug\w3wp-sp2007

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*
OK                                             C:\Debug\w3wp-sp2007

Now run .cordll -ve -u -l as suggested below the verification steps.

0:000> .cordll -ve -u -l
CLRDLL: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscordacwks.dll:2.0.50727.8009 f:0
doesn't match desired version 2.0.50727.4253 f:0
CLRDLL: Unable to find '' on the path
Cannot Automatically load SOS
CLRDLL: Loaded DLL c:\debug\w3wp-sp2007\mscordacwks_AMD64_AMD64_2.0.50727.4253.dll
CLR DLL status: Loaded DLL c:\debug\w3wp-sp2007\mscordacwks_AMD64_AMD64_2.0.50727.4253.dll

From the output it seems that the correct version of mscordacwks.dll got loaded, but sos.dll didn't.

What .cordll does is search locations, including the public Microsoft symbol server, for the DLLs (WinDbg prints/hides the paths searched through the !sym noisy/!sym quiet commands). Activating noisy output, we can tell that for this version of the CLR the DLLs aren't on the symbol server, and we see failed attempts at locating mscordacwks_AMD64_AMD64_2.0.50727.4253.dll before it's finally located. Then WinDbg starts looking for sos_AMD64_AMD64_2.0.50727.4253.dll but cannot locate it. The output indicates that we should rename sos.dll to sos_AMD64_AMD64_2.0.50727.4253.dll and rerun .cordll -ve -u -l.

0:000> .cordll -ve -u -l
CLRDLL: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\mscordacwks.dll:2.0.50727.8009 f:0
doesn't match desired version 2.0.50727.4253 f:0
Automatically loaded SOS Extension
CLRDLL: Loaded DLL c:\debug\w3wp-sp2007\mscordacwks_AMD64_AMD64_2.0.50727.4253.dll
CLR DLL status: Loaded DLL c:\debug\w3wp-sp2007\mscordacwks_AMD64_AMD64_2.0.50727.4253.dll

Conclusion

At this point we've loaded the correct versions of mscordacwks.dll and sos.dll and the SOS commands are at our disposal.

For an in-depth treatment of WinDbg, Advanced .NET Debugging is an excellent book covering some native WinDbg commands but mostly focuses on SOS. By the same author Advanced Windows Debugging covers the topic from a native point of view. Also, the Introduction to WinDbg is a nice series of screencasts introducing native WinDbg commands, many of which are also shown in action in this Windows Debugging and Troubleshooting presentation. As a supplement, WinDbg Superpowers for .NET Developers shows how to script and extend WinDbg.

On a related note, the Unlock the essential toolbox for production debugging of .NET Web Applications presentation shows how WinDbg fits in with other debugger tools, and Open Security Training provides an excellent introduction to low-level details of assembly and linking that to some extend are prerequisites to fully understanding WinDbg's output.

PS: Capture dump from Azure Web App and load it into WinDbg.