LJDT: Reliable Service / Connection Tests; netcat, not ping, is the correct tool
Me: "I cannot get to your site." Helpdesk: "Hmm, let me check. I can ping it, so it must be okay." Me: "No, pinging has nothing to do with access to the site." Helpdesk: "No, you're crazy, I can ping it so it's your problem. Goodbye."
If you have never had this type of exchange with others, you’re lucky. The fact of the matter is that ‘ping’, while a good (but not 100% even here) tool to determine if traffic can go from one IP to another, is a lousy tool for determining if anything more than layer three is working. The fact of the matter is that the network layer, which is where IP lives, is not one you care about in 99.999% of cases. Do you care that you can ping Google, even though your browser spins indefinitely trying to open or gets a Connection Refused when trying to pull up the page (not that this happens with Google much…. maybe hotmail would be a better example)? No, of course not… all you really care about is that a service is not working, which is more obvious at layer four (not three), and ping doesn’t care a thing about it.
Another silly example: Can you ping your workstation? How about your coworker’s workstation? Sure, but does that mean they are running a web server that you can access? Of course not, though they may be. Testing for a service is much more important and relevant for most of us, but we’re too stuck on ping. In the IT world (vs. the end user world) this is even more critical because we test services that we setup on odd ports in locked-down environments that may, annoyingly, block ICMP packets (including pings).
So how do we test for services? In the “Linux Just Does That” (LJDT) series we’ll cover this exactly. Welcome the newest tool in your toolbox: netcat
If you Google for netcat you’ll find a lot of interesting information; if you install it on an inferior platform you may get errors, warnings, and general upheaval from your anti-malware/virus software stating that it is a virus. Misclassification of software is the specialty of anti-malware software, though, so either define an exception or, better yet, change to a real OS. If that proves to be too difficult then go to your SLES server (or openSUSE server, or just about any Linux platform out there) and either netcat will be installed or it can be trivially added with a command like one of the following:
> sudo zypper in netcat #sles 11 > sudo zypper in netcat-openbsd #openSUSE 12.x
Once installed we need to test it. First, try something simple that is almost certain to work: a test of something local to the system. Most Linux systems are listening locally on SMT’s default port of TCP 25 (postfix/sendmail), so let’s test that first:
> netcat -zv 127.0.0.1 25
The output should look something like the following:
Connection to 127.0.0.1 25 port [tcp/smtp] succeeded!
What just happened exactly? Well we’re testing TCP so even if ping failed (for any reason) the test actually made a connection with the socket (127.0.0.1:25) in question and then reported on the success of that test. This is nice because when a client of yours (SMTP, HTTP, SSH, LDAP, etc.) tries to communicate with a server, this is exactly what it does from the start. It does not first ping the box (unless it is poorly-coded) because ICMP is completely irrelevant to its final goal of trying to connect, authenticate somehow, and then get or put data. TCP is used for all of that, and netcat is a TCP/UDP tool. Now that you know the tool works try something else:
> netcat -zv google.com 443
The results should be identical, other than the socket information that you entered differently. Note that when running netcat this way it is doing some resolving of host and port data for you. The port information is coming from /etc/services, so that should be immediate in all cases, but the host data are resolved via the nameservice features of your OS. . If, for example, you ping something like 192.168.0.1 (assuming that is a valid box on your network) you may notice that it cannot resolve that address to a valid DNS name, and netcat is the same. I mention this because while netcat tries to resolve this to a DNS name it will stall for a couple of seconds. Go ahead… test netcat against something that cannot be reversed to a hostname.
The workaround is trivial; if you are going to test an IP address directly (even if it can be resolved to a DNS name) then add the ‘-n’ option. As with other commands (ss/netcat/etc.) this turns off resolution of the hostname, or port to a service name, which is not really a problem since you’re just trying to test connections; who cares what the OS thinks of the name of the service tested? :
> netcat -znv 192.168.0.1 80 Connection to 192.168.0.1 80 port [tcp/*] succeeded!
netcat can be used for a lot of other things such as a basic TCP or UDP client for anything. To enable UDP use the -u option; note that connection tests to a protocol that is, by definition, connectionless, are harder, but can work if the receiving side sends back ICMP messages when a port cannot be reached (SLES does not by default; you’ve been warned). As always, because UDP is, again by definition, unreliable, do not rely on it for anything that you care a lot about, or anything where the application layer is not implementing some kind of verification of data should those data be important.
What if you are not sure that your application (Apache’s httpd for example) is working properly? You know that it’s running, and you know that it should be listening for traffic, but you cannot connect from a client machine. You have even pulled out netcat on your openSUSE laptop and tried to reach it, but it just times out or maybe even fails entirely:
netcat: connect to 192.168.0.1 port 80 (tcp) failed: Connection refused netcat: connect to 192.168.0.1 port 80 (tcp) failed: No route to host
Even though netcat as a client can’t help you, netcat is a basic TCP/UDP tool that can act as a server to help you. You may not know whether or not your TCP-enabled application (like Apache’s httpd) is working, but you can setup netcat to act as a server and then test the connection to it that way, thereby ruling out a problem with the application itself. If the connection test from the client to the netcat server does work, you know that your service is the problem; if not, you can probably focus on firewalls, network routes, or other network layer (layer three) problems. Doing this is almost as simple as the connection tests shown previously. The addition of the ‘-l’ (el, for listen) tells netcat to listen, and (for some implementations) the ‘-p’ option tells it which port to use (note: some implementations of netcat vary on options/switches; check the manpage if these examples do not work for your distribution):
> netcat -l 8080
On my laptop nothing happened here, or nothing seemed to happen. Adding the ‘-v’ option to make it more verbose does not change things. On my SLES server (different netcat implementation, so ‘netcat -l -p 8080 was used’) the same bit of nothing happens, but when adding the ‘-v’ option I get data back as shown here:
listening on [any] 8080 ...
Okay, well how could I have known that before? Simple… ss (formerly netcat) shows listening sockets pretty easily:
> /usr/sbin/ss -planeto | grep 8080 LISTEN 0 1 *:8080 *:* users:(("netcat",10525,3)) uid:1000 ino:9561232 sk:ffff88009cbc82c0
The ‘ss’ (or netcat) output shows that port 8080 is listening on all IP addresses and that it’s doing so with my user (uid 1000) using the ‘netcat’ command. That’s pretty neat, but who cares? If I had stopped Apache httpd and instead used the command above to start netcat on the same port I could now test a connection to this socket from my client machine, or from the local machine, or from a server “nearby” logically (on the same network), using the first examples from above:
> netcat -zv 127.0.0.1 8080 Connection to 127.0.0.1 8080 port [tcp/http-alt] succeeded!
A couple of interesting things may have happened here. First, my output is from a test from the same machine that is running the netcat “server” so that I know TCP/IP is working properly. The next test, from a remote machine, may work or fail for various reasons. Because Linux has a history of secure implementations (and SUSE is no exception) the default implementation includes a firewall blocking incoming access from everybody for all ports, unless trivially excluded by the server administrator. If your connection test failed, especially if connections to other sockets on the same box work, or if ‘ping’ works to verify that the network (IP) layer is working properly, then the host’s firewall is likely blocking traffic. You can verify as much by checking the firewall’s log in the /var/log/firewall file, or by checking ‘sudo /usr/sbin/iptables -nvL | grep 8080’, or by doing any number of other things (checking in Yast directly via ‘sudo /sbin/yast lan’), but suffice it to say that you know for certain that some things work between client and server, but this one port does not. You know, fairly certainly, that the socket itself (port 8080 on this box) is the problem, not the application, because you have tried two applications and you also have tried a connection from the local box to the applications and those work.
Those of you watching the server side may have seen some interesting stuff; specifically, netcat ended when you did the connection test. The reason for this is that you did not tell it to persist beyond one connection, so when you tested the connection from a client, the server side responded and then exited normally. Some implementations of netcat (like on openSUSE) have the option to listen forever so that when the first connection ends the listener waits for another. For multiple connection tests this is nicer, though you could as easily do this with a while loop in bash with any implementation:
> while [ 1 ] ; do netcat -l 8080; sleep 1; done #functionally equivalent to the following command on openSUSE > netcat -k -l 8080 #listen after the TCP connection breaks
Super geeks will note that this disconnection after a connection does not apply to UDP. Why? Because UDP is connectionless, so there is no start or end of a connection… just listening for data.
If anybody out there still uses the super-ancient command ‘telnet’, netcat is the old replacement for that ancient, insecure, troubleshooting-incapable command. Yes, I know you’ve used it for years on an inferior platform because that’s all that was available, but it never did the job well and does not do half of what the most-basic versions of netcat will do. Perhaps more significantly, it is no longer a default part of most systems, primarily because it is completely insecure. Telnet’s limitations include the fact that it was made to work for humans and is not really be a networking tool. Ever tried to send binary data with telnet? Of course not… it’s a disaster waiting to happen, and besides that you do not know how to interpret binary data yourself (neither do I without a lot of help). netcat shines in this area too, so here’s a fun example for anybody still reading:
This example can all be done on a single machine with a couple of shells open, or can be done between multiple machines (watch out for those firewalls). We are going to send data back and forth between shells using netcat, starting with some nice basic text:
> netcat -l 8080 #first shell, setting up the "server > netcat 127.0.0.1 8080 #use whatever IP address is necessary to connect from this, the "client"
Hooray! Two blank screens with just the command you typed, and nothing else happening. This will sit here, indefinitely in many cases, and you’re done. Still, to add a little variety, poke at things with the ‘ss’ command to see your ESTABLISHED as well as LISTENing connections:
> /usr/sbin/ss -planeto | grep :8080 LISTEN 0 1 *:8080 *:* users:(("netcat",10665,3)) uid:1000 ino:9557715 sk:ffff88009cbc82c0 ESTAB 0 0 127.0.0.1:8080 127.0.0.1:46301 users:(("netcat",10665,4)) uid:1000 ino:9557716 sk:ffff880130232480 ESTAB 0 0 127.0.0.1:46301 127.0.0.1:8080 users:(("netcat",10713,3)) uid:1000 ino:9566805 sk:ffff880130233200
If you have an established connection then you can type messages between shells. Go ahead… type something in one shell and followup with [Enter]. Check the other shell and see the data there. Now type in the other shell and press [Enter]. Back and forth you go. You’ve just created your own super-basic communication server and client. Sure, it’s not as pretty as instant messages, but it works. What if you want to send something else, though? Ever wanted to transfer a file from here to there when FTP wasn’t setup, HTTP wasn’t setup, and SSH was unusually inoperable? If the firewalls allow it, that’s pretty easy to do. I’ll do another test with localhost only, but changing IP addresses for your environment is just a matter of typing in something valid for you:
> netcat -l 8080 > ./newnetcat #Setup the server to write whatever it gets to 'newnetcat' in current working directory (CWD) > cat `which netcat` | netcat 127.0.0.1 8080 #In another shell (or on another box if IPs are changed), send the 'netcat' command found first in my $PATH through, well, netcat, to the server #note the use of backticks, not single-quotes, around the 'which' command
Okay, what did this do? If this was all executed on the same box then the following command should make it clear that you just copied a file from one shell to another using TCP:
> md5sum `which netcat` ./newnetcat 8769717be5f9d21a449bc4a46699078d /usr/bin/netcat
Sure enough, the two files are identical. If I make ‘newnetcat’ executable it will work just like the original file that I sent. Sending binary data isn’t hard, but it does not work with tools like telnet because telnet was designed for humans only. This particular test is maybe not helpful, but the same things can be done to help recover a box that is failing (I used it for this once on a laptop whose hard drive was semi-dead). Consider the following:
> netcat -l 8080 > ./my-backup.tar #write whatever comes in to the ./my-backup.tar file; run this on a working, healthy machine > tar -cv /home/ab/important-stuff-directory | netcat 192.168.0.2 8080 #run this on a dying machine
The first command should be obvious by now since it is just listening for data and writing to a file. The second command, running on the failing system (booting recovery disk or whatever) is going to run ‘tar’ to grab things in my ‘important-sutff-directory’ and then send it out onto the wire thanks to netcat, specifically to 192.168.0.2 (which is the machine running the first, listening, netcat command) to port 8080 (where the first netcat command is bound). When completed I can view my data sent across the wire on the machine that just wrote the file for me:
> tar -tvf ./my-backup.tar
Okay, we’re off track at this point, but the idea is to show what a simple, powerful tool can do. The Linux/Unix world has a philosophy about modular design, and doing one thing very well. netcat, like most of the commands in the *nix world, adheres to that philosophy, and while it has grown up over time to add things like the ability to work with proxy servers or use Unix sockets, at the end of the day it is a very simple tool that can do very powerful things because of its simplicity.