Code coverage is an indicator, not a goal

The typical message “out there” presents code coverage as something that would ideally be 100%. That part is fine. What is not fine at all are both assumptions and direct suggestions as to how to accomplish this, stemming from the misconception that code coverage somehow indicates how good your testing is. While somewhat related, it does not. Let’s uncover code coverage, disassemble it, and see what it is.

Unit

We’ll begin with the “nice” case. The code coverage measurement indicates the portion of the desired and needed code that is exercised. How much is that vs how much is missing? Consider the following code:

if (condition 1) {
    instructionA1();
    instructionA2();
    instructionA3();
    …
    instruction161();
}
else if (condition 2) instructionB1();
else if (condition 3) instructionC1();
else if (condition 4) instructionD1();
else if (condition 5) instructionE1();
else if (condition 6) instructionF1();
else if (condition 7) instructionG1();
else if (condition 8) instructionH1();
else if (condition 9) instructionI1();
else /* condition 10 */ instructionJ1();

This example is designed to point out a well-known and non-controversial issue. Suppose that the tests only exercise “condition 1” and ignore the remaining 9 conditions. That covers 10% of branches. Note, however, that the condition being tested has the most instructions – 161 in addition to the condition check. If we count each condition check as a single instruction as well that is 162 instructions exercised out of 180 total, yielding 90%. Thus, is this a 10% or 90% coverage or neither because it doesn’t take real-world occurrences into account?

In some cases, the branches are hidden and can be expressed as branchless functions. Consider the following two approaches for a function returning an absolute value of a number:

  1. (x >= 0) ? x : -x

  2. x * sign(x)

Assume sign(x) is itself branchless and leverages bitwise operations with the sign bit to calculate its value. If sign(x) returns -1, 0 and/or +1 given x this works. If we’re wrong about it and it, for example, literally returns the raw value of the sign bit extended, it may return -1 for negative values but 0 for all others, including positive and that wouldn’t work. If our test runs a single negative value case, it will pass (100%), test 100% of instructions and 100% of branches of that implementation (2) but not of (1). If you feel that this occurs only with trivial cases like this, you’d be wrong. The more complex cases get the more chances there are that some single apparent branch doesn’t seem to cover everything it is expected to and requires further division one way or another.

At the very least this should shake your belief that you can pick a target code coverage percentage to be content with (anywhere from 10% to 90% in that first example) or that there is a certainty associated with 100%. What we’re left with is that a greater number is better than a lesser one but only if measured the same way.

Missing Coverage

What do we do to increase that coverage? A typical suggestion and reaction I observed is to identify the code (branches and/or instructions) that aren’t covered and write new tests that exercise those. While doing this the following may happen:

  1. You cover new, previously uncovered yet very much needed cases.

  2. You cover new, previously uncovered cases that aren’t needed immediately but could be if we extrapolate the intent of the code.

  3. You cover new, previously uncovered cases, that will never be exercised in production, ever, but you don’t realize that because you have constrained visibility. You’re covering dead code while adding complexity for future enhancements to that code to obey artificial constraints imposed by the new tests.

  4. You discover that it is impossible to reach that case at all. You can’t decide if this is some kind of defect or dead code that could be removed.

  5. You remove the uncovered code, effectively increasing the coverage percentage while all tests continue to pass. You risk that production fails if it does rely on that code.

Here’s a bonus question: is that code truly not covered or is its coverage simply not measured as you only consider a subset of test types? Either way, it should be apparent that the missing coverage isn’t necessarily identifying missing tests – it may identify exclusion of tests and dead code that you shouldn’t have in the first place – or test for it. Blind chase of a greater code coverage percentage will take you to rabbit holes of needless complexity increase from which everyone suffers, and nobody benefits while hiding dead code. That brings us to…

False Coverage

The pursuit of ever greater code coverage without consideration of actual reasons for the coverage ends up artificially inflating the coverage percentage with code that should not exist in the first place. We’re talking of:

  1. Remnants of the past that are not in use anymore.

  2. Extrapolations that will never be used.

  3. Coded misunderstandings.

  4. Downright defects. Remember: to err is human.

Covering any of these is turning the unwanted into enforced, while continually paying for that enforcement and the impact of maintaining those. Worse, those unwanted behaviours may end up being relied upon if someone takes “inspiration” from those blind coverage tests, thus creating unintended coupling. Such couplings make it harder to evolve the true intent because they are forced to remain compatible with the literal abuse.

Code Uncoverage

Code coverage is an indicator, a metric, and a tool for us to make decisions. It must not be viewed as a direct goal. It doesn’t tell us how good our tests are or how much of what we need or even expect is covered by those tests. Think about it: tests should be there to check if what we need from the code works well. In that sense, we should be aspiring to 100% “need coverage”. You may find “needs” expressed differently in your case … or not at all. They used to be referred to as “specifications” a while back, a practice I have seen essentially disappear.

Most often I see direct coding from largely ambiguous user stories and/or requirements and an expectation that the same developers who write code should also write tests for it, with very little opportunity for debate. What that accomplishes is that all the misunderstandings, misconceptions and other reasoning errors the developer remains with after having done the work, remain coded as desired and enforced behaviour for the future. Yes, they also discover some defects along the way and fix them but that isn’t the only outcome.

What does that mean? It means that someone else is needed to cover the blind spots and errors of the code developer. Code reviews aren’t enough – if they were, we would not have needed any tests in the first place. But what do these other people base their testing on? It should not be influenced by the code developer as they may end up being convinced of the same misconceptions. They should have different perspectives and, thus, different misconceptions too. Unless the misconceptions align, the outcome will be better testing. If tests fail because of different perspectives that indicates that something needs to be done to correct that, usually by making the foundations, usually some form of specifications, clearer.

Note that TDD (Test-Driven Development) does not address this concern. It guides the developers to do work in a certain way that improves some aspects at the expense of others. Whether this is suitable for your case is a separate decision. It does not directly address the coverage issues outlined here as it does little more than flipping the order of code and test development by the same person. In fact, this may yield inflated coverage numbers due to achieving (near) 100% coverage with code that would never really be exercised in production.

Conclusion? Less than 100% coverage lets you uncover code you don’t need or you don’t test. You need to investigate which of the two it is, not assume either. The part that is covered doesn’t tell you much either. All of it may be of unneeded code, an enforced legacy. 100% may seem like the ideal but even with passing tests, it does not mean that you covered all the cases due to hidden branches. You should absolutely look at code coverage, but you must interpret it well.


Next
Next

Security: Recognizing the Inappropriate