My title “The network never lies” might seem a little bit naive. It might be better to say “the network lies the least.” That has been my experience to date. At my first real job I was working at a bank taking care of their Internet connections (this is going back 10 years or so). My manager there (probably my best manager and by far smartest manager to date if you include technical understanding and abilities, and political experience) when engaged in solving a problem time and time again always ignored the application logs and errors initially. He would take a look at them briefly, but if it wasn’t obvious in the first few minutes, he would pull out the network analyzer. He’d pull back some packets, take a look and in a few minutes say “there is the problem.” Sure enough it was, often completely different than the error messages the application or the logs were showing. He said to me one time while working on a problem “The network never lies.”
Now I may also be a bit biased as my background is networking and I’ve worked with networks for a long time SNA, Novell, and IP. Regardless of the type of network I find the same thing. By simply using Wireshark or another packet analyzer you can save so much time solving the problem.
Case in point. A friend of mine has a law office and about 15 employees. Years ago, she was in a bind with a really bad consulting firm and being friends she asked me to assist. I redesigned her network, as well as her applications, security, permissions, etc. Basically the entire network, servers, and applications. The office is completely paperless and has been for about 7 years now. Unfortunately, most legal applications require Microsoft proprietary servers, and databases so although there is some Linux emulation in the environment it is minimal. Recently a new large server was purchased and the environment was to be totally upgraded and placed on the new server. The new environment includes Vmware, as well as the latest in Microsoft products.
During my holdiays, I have been leading the charge to properly get the new system set up, data migrated, backups working, security in place etc. One of the first steps was to set up a new active directory instance and make it a primary domain controller and have it take over the FSMO roles. Creating a new active directory server and connecting it to the existing active directory instance was trivial. Attempting to migrate the FSMO roles, caused multiple failures with erroneous error messages. These errors created hours of searching the error messages on Google and Microsoft support sites, and reading forums on the problems and causes. Most solutions turned out not to work, or were not the actual problem. Most of the error messages presented by Microsoft were not even close to what the real problem was. Frusterated, I started Wireshark, and captured a trace. Low and behold, a DNS query for some long weird string was failing. The long string turned out to be the GUID. Manually entered this and presto, FSMO roles migrated with no issues. Why a GUID? Why not just a server name? No, no that would be to simple, let’s make it complex?!?!
In my line of work, I am constantly told by software engineers statements such as “Look at the logs”, “what is in the database”, “why do you need to know that?”, “you don’t need to see that information”, “you don’t need tcpdump” and other similar lines. I of course always disagree with them which I suppose frustrates them. My experience above always seems to work or at least greatly reduce the time to solve a problem. Look at the network is the lesson from years ago I am reminded of time and time again.
Now maybe I am a bit biased. My background is networking and security. I’ve always liked networking and my understanding of it is pretty good. I would suggest however that applications, sub systems, and kernels need to be smarter on logging errors, especially in the Microsoft world. They should always have the ability to easily turn on a debug mode without having to go to a registry, flip a bit in Hex, and reboot or some other complex sequence of events. And why is everything in the Microsoft world so interdependent? DNS is required for Active Directory and it has to be Microsoft DNS, without a lot of work to use a different DNS. Microsoft Exchange requires IIS web server to be running? It’s like a big monolithic interdependent system design. I guess I am digressing and this is a different topic for a later time.
Moral of my story is: The network never lies (for the most part anyway), and Wireshark or a packet analyzer is a good friend when it comes to solving application problems.
