Hello my friend,
The topic to the today’s article I’ve found just occasionally during performing troubleshooting of one very important platform in service provider network. Nevertheless I’ve found that the results (or more correctly reason of failure) perfectly fit what Project Quality Management is about.
Real case scenario
The root cause of my troubleshooting was bad physical cables (network guys know how it’s difficult to define such problem properly). The overall architecture is very straightforward: there are several servers and each has fully redundant connectivity to two routers (PE devices). The cables between server and PEs were installed and the connections seem to be up and running, but there were too many errors with application traffic running on top (though the network interfaces were error-free). Finally we have found with the colleagues that there was a packet loss at some interfaces up to 80%. The worst part of the problem is that some servers have packet loss at both interfaces, which effectively makes their operation impossible. So we had to fix at least one link from two to restore the operation of the platform. This leads to the following rule:
It’s better to have one non-redundant fully OK connection than two redundant connections with problems.
Why is it about project management?
You can ask me how this story relates to project management. It’s about bad cables and network engineering. Well, you are probably right, but let me explain my point of view. In Project Quality Management we have two main terms “quality” and “grade”. The PMBOK says (chapter 8):
Quality and grade are not the same concepts. Quality as a delivered performance or result is “the degree to which a set of inherent characteristics fulfill requirements” (ISO 9000). Grade as a design intent is a category assigned to deliverables having the same functional use but different technical characteristics. The project manager and the project management team are responsible for managing the tradeoffs associated with delivering the required levels of both quality and grade.
So what is the quality requirements to the connectivity? There are may be many different, like speed, availability or cost. In our case we are speaking about availability, so it’s usually expressed in percentage of being possible to operate like: “Server must be available 99,995% 24 hours 365 years per year”. The redundancy itself can’t be a requirement. It’s rather a mean that is used to achieve the necessary requirement (i.E. level of availability). Another mean can be regular monitoring and checking of the link’s quality or even performing some kind of planned maintenance to make end-to-end cable’s measurements and connector clearance.
It’s possible to provide more other examples describing the same situation. The main idea is to stress the fact that the requirements and their satisfaction is a key, whereas the mean, how to achieve it, isn’t.
From my previous experience, I remember that the interdependence between packet loss and overall performance degradation isn’t linear. 10% packet loss caused by bad link or traffic congestion at the interfaces leads to approximately 50% of performance degradation for network applications using TCP (like internet browsing, file transfer, access to database and so on).
When you develop your project, always put the requirement first. I don’t want to say that redundancy it’s bad. If I say it, I’ll be totally mad. I just want to say that the requirements must be assessed in complex, and their solution should be also complex, covering all possible aspects. Setting high quality may be difficult to achieve, but certainly it’s worth it, as profits besides money are customer satisfaction and good reputation.