Diagnosing network issues by building TCP/UDP ping into an application

15 Jun 2016

Part 1: Lessons learned setting up an IIS web application for double-hop Kerberos authentication with delegation
Part 2: Diagnosing network issues by building TCP/UDP ping into an application
Part 3: Making outbound requests from an ASP.NET MVC/WebAPI application using Kerberos delegated credentials

As a development team, we may face many challenges while diagnosing permanent or transient network issues with services that our application depends on. Some of these challenges include that (1) reliable information may be hard to come by without on-demand access to diagnostics tools such as psping, Fiddler, or Wireshark, (2) the application's environment and those of dependent services may fall under the jurisdiction of other teams, (3) access to dependent services is only allowed from whitelisted server IPs, (4) the application team can only access production by proxy of other teams.

As a development team, we can address some of these challenges by becoming less dependent on other teams. For instance, network issues become simpler to diagnose if we build into our application the ability to on-demand TCP or UDP ping any endpoint (address and port number). Dependent teams may even find this feature useful as they can trigger a connection attempt from an application to their service while troubleshooting at their end. Incidentally, this was how we got Kerberos authentication with delegation working.

Besides the implementation of TCP and UCP ping in our application, this post contains Wireshark analysis which shows why and how the pings work at the network level. Knowledge of how to read Wireshark traces and the basics of the TCP and UDP protocols is invaluable in diagnosing a range of issues otherwise hidden by layers of abstraction.

Communicating over network sockets

A simple way to diagnose network issues is by attempting to establish a full-duplex transport layer socket connection between two endpoints. As a negative test, this quickly reveals whether the other end is incapable of receiving traffic because no services is listening or if a router along the path is blocking the traffic.

Pinging over socket is more reliable than pinging over ICMP because network devices may block, drop, or not prioritize ICMP over regular traffic. Also, ICMP resides in the Internet layer which is host to host without port numbers. Like sockets, port numbers are a Transport layer construct added to allow processes on separate hosts to communicate over a virtual circuit. As we'd prefer to assert the availability of services running on well-known ports, and not the host itself, we must be able to ping specific ports.

With the Internet protocol suite in mind, think of a socket as bridging the gap between the Transport and Internet layers. The socket abstracts away splitting the stream of data into IP datagrams and reassembling those at the other end, and for TCP it provides connection orientation with reliability and flow control.

Most application layer protocols use UDP or TCP as their transport. For TCP, this implies that before sending application layer protocol specific data, it must first establish a socket connection between two endpoints.

Communicating over a UDP or TCP socket generally requires protocol operational knowledge. Requesting data from an HTTP endpoint, for instance, we must know how to form HTTP requests and process HTTP responses. While perhaps possible for text-based protocols, binary protocols such as DNS or Kerberos pose a challenge. However, in most cases, being able to establish a connection regardless of the application layer protocol is sufficient for asserting connectivity.

Pinging a Transport Control Protocol endpoint

To verify TCP connectivity, we attempt to establish a socket connection between two endpoints. With TCP being a connection oriented protocol, establishing a connection indicates success.

Establishing a TCP socket connection requires only the following few lines of code with configurable IP address/host name and port number. In comments, we provide the ASP.NET MVC implementation of a controller action for which arguments are provided by query string parameters:

public bool /* ActionResult */ TcpPing(string host, int port) {
    using (var s = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)) {
        s.Connect(host, port);
        // return Content(s.Connected.ToString());
        return s.Connected;
    }
}

When the socket is connected, the method returns true. Otherwise it returns either false or throws a SocketException.

Wireshark protocol analysis

To see how setting up a socket connection qualifies as a ping, and to gain knowledge of TCP in general, we'll analyze the Wireshark trace captured when connecting to a host. In doing so, observe how Wireshark provides visual cues to what's captured. Any text in square bracket indicates a Wireshark interpretation and the left square bracket along the No. rows indicates that the packets are part of a single TCP stream, i.e., a conversation between two endpoints throughout the connection:

Host A sends a TCP segment (#245) -- TCP data encapsulated in an IP datagram -- from one of its ephemeral ports to port 80 on host B. We may think of A and B as client and server, but those are application layer concepts. As far as TCP is concerned, A and B are peers in the process of establishing a full-duplex communication channel.

Three-way connect handshake

The initial segment sent from A to B has its SYNchronize control bit set, which means it includes a random initial sequence number that A will use in its future communications with B (more on the use of sequence numbers in a moment).

In principle, B could respond to A with a segment with the ACKnowledge control bit set, indicating that B positively acknowledges receiving A's initial sequence number. B could then send to A a separate segment with the SYN control bit set, including B's random initial sequence number for use in B's future communication with A. But to minimize the number of round-trips, B usually responds (#255) with a single segment with both the ACK and SYN control bits set and including B's initial sequence number.

A then responds with a segment (#256) with the ACK control bit set to positively acknowledge having received B's initial sequence number. This completed the three-way connect handshake, and with a full-duplex connection established, application protocol specific data can now flow in either directions. In other words, the ping is successful.

Four-way disconnect handshake

Looking at the code, it's clear that we don't send any application protocol specific data. The socket is immediately closed and disposed of, causing A to send a segment (#257) to B with FINish and ACK (of previous segments) control bits set. B in turn responds (#269) with its FIN and ACK (of previous segments). Finally, A sends a segment with an ACK (#270) for receiving B's FIN and the four-way disconnect handshake is complete.

While the three-way connect handshake needs to happen as outlined, the four-way disconnect handshake isn't always perfect and doesn't have to be. In the ideal case, it should've consisted of the exchange FIN/ACK, ACK, FIN/ACK, ACK, but in the trace we're missing one of the ACKs. The exchange nonetheless is good enough for both ends to agree on closing the connection.

Reliability and flow control

Let's briefly touch on the role of Sequence number (Seq), Acknowledgment number (Ack), and Window size (Win) as shown in the trace. These are particularly useful to understand in the context of a larger trace.

TCP maintains one Sequence number per direction to identify segments sent. It's essentially a byte counter that increments based on bytes in the payload field of the TCP header, with a few exceptions as stated in the RFC. That's why after a successful connect handshake Sequence number is always 1.

TCP maintains one Acknowledgement number per direction to identify the next segments it expects to receive. Having the other party positively acknowledge received segments is what makes TCP reliable. If one party doesn't receive an acknowledgment, it'll retransmit the segments sent since last acknowledgment (in practice a few optimizations apply).

TCP maintains one Window size per direction to indicate the largest number of bytes that A may send to B before B must acknowledge having received those. B's acknowledgment comes in the form of an ACK that includes the next sequence number which B is ready to receive. While waiting for the ACK, A isn't allowed to send more segments. TCP implements flow control, i.e., the ability to control the transfer speed in either direction, by dynamically negotiating the value of Window size based on how full the send/receive buffers are on either end. Otherwise a fast sender could swamp a slow receiver, leading to packet loss, retransmission, and degraded performance.

Example of failed TCP ping

To illustrate what connecting to a port with no service listening looks like, here's another Wireshark trace:

The initial SYN from A to B is as before, but because A doesn't receive an ACK from B, a retransmission timer within A's network stack eventually expires, causing A to resend the SYN. Again a retransmission timer expires, and A makes a final SYN attempt after which A gives up, resulting in .NET throwing a SocketException. Notice how with each retry attempt, the network stack increases the retransmission timeout period. This explains why it takes so long for the SocketException to appear.

Pinging a Universal Datagram Protocol endpoint

As indicated by the length of the UDP RFC, UDP is essentially IP with port numbers added. Being a connectionless protocol, we can't rely on connection setup as a proxy for success (although an application layer protocol is free to define its own connection orientation on top of UDP). All we can do is sent data over the socket and hope for a response:

public int /* ActionResult */ UdpPing(string host, int port) {
    var probe = Encoding.ASCII.GetBytes("Probe");
    using (var s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp)) {
        s.ReceiveTimeout = 10000;
        s.Connect(host, port);
        s.Send(probe, probe.Length, SocketFlags.None);
        var receiveBuffer = new byte[1024];
        var i = s.Receive(receiveBuffer, receiveBuffer.Length, SocketFlags.None);
        // return Content(i.ToString());
        return i;
    }
}

If an endpoint receives our probe, because we may not be sending correct application layer protocol data, the process listening at the other end may choose to ignore us. Without build-in acknowledgment, UDP can't tell if data is lost along the path, if the other end is ignoring us, or if no service is listening. In either case the ping fails.

Pinging Google's DNS server on well-known port 53, the Wireshark trace looks as follows:

In this case we know the Google DNS service is listening and that it's ignoring us. The message displayed about the malformed packet is Wireshark interpreting what it assumes is a valid DNS request. But the serialized form of "Probe" isn't a valid DNS request.

Switching to port 54, where no process is listening, we see that Wireshark no longer attempts to interpret the packet as anything but raw UDP. Sill, there's no sign of whether the probe arrived at the destination:

Some UDP based application layer protocols may actually respond to a malformed request. But generally not being able to distinguish between lost, ignored, or no service, UDP ping isn't particularly useful.

Scripted ping of many hosts on many ports

The following PowerShell script automates calling our ping endpoint while pinging many hosts on many ports, and generating a table with the results. The script triggers server generated pings from any machine with access to the acme.net diagnostics controller. In one real-world example of troubleshooting Kerberos with delegation, the list of hosts were domain controllers and the list of ports the ones required for Kerberos to operate:

$hosts = (
    "8.8.8.8",   <# Google DNS #>
    "google.com" <# Other Google server #>)

$tcpPorts = (
    53, <# DNS #>
    80  <# HTTP #>)
        
# UseDefaultCredentials required when the endpoint disallows anonymous access
$wc = New-Object System.Net.WebClient -Property @{ "UseDefaultCredentials" = "true" }

foreach ($h in $hosts) {    
    Write-Host -NoNewLine "$($h):`t"
    foreach ($p in $tcpPorts) {
        $c = ""
        try {
            $u = "https://acme.net/service/diagnostics/tcpping?host=$($h)&port=$($p)"
            $s = $wc.DownloadString($u)
            if ($s -eq "True") { $c = "green" } else { $c = "yellow" }
        } catch {
            $c = "red"
        }
        Write-Host -NoNewLine -ForegroundColor ${c} "${p} "
    }
    Write-Host ""
}

Running the script, it generates a hosts by ports matrix with each element indicating the connection status:

%> .\test.ps1
8.8.8.8:    53 80
google.com: 53 80

Conclusion

While implementing TCP and UDP ping is simple, understanding how and why it works requires knowledge of the Internet protocol stack. Knowing what success and failure looks like from Wireshark's point of view is useful in diagnosing a multitude of network issue, not explicit related to socket setup. It allows us to treat what goes on at the lower network layers as less of a black box.