An age old problem when testing something, is knowing when to stop. We’ve talked a lot about risk in previous entries, but some actual numbers are useful to keep the project managers happy. One measurement that is particularly popular for unit tests is code coverage. This is probably because you don’t care about whether the whole thing works, just the individual units, and you can count them easily enough. For example, the criteria for entry into the traditional test cycle, might include that the developers should have exercised at least 80% of the code. The theory here, is that with detailed knowledge of the code, the developer can exercise all the error paths which are tricky to get to unless you know exactly how. The trouble is, that with deadlines looming, measurements like code coverage can easily encourage people to test the easiest 80% – in this situation, you have a developer who has tested every “getter” and “setter” method that they can find, but haven’t ventured into the bit of code that only gets run when it’s a full moon.
Of course, it’s good to have some unit testing done before functional testing begins, but the purpose should surely be just to check that the whole thing isn’t going to fall apart as soon as somebody looks at it funny. Perhaps a better place to measure code coverage is after the black box tests have been run. If the developers could do some basic tests on the product, to at least prove that the product works sometimes, that would probably be good enough as a test entry measurement. Then, we could examine the amount of code that is covered by all the black box tests, which are trying to hit all the problems that customers might hit. Now, armed with the information about which bits of the code we know aren’t tested, and which we know are difficult to hit, we can start to use white box tests more effectively.
Of course, you still need to know when that first wave of “bring up” tests are done, so that the more formal testing can begin. This might be hard to quantify, so this approach does imply that you might not get any concrete test measurements until later in the cycle – of course, if the current measurement of unit test code coverage is actually providing false confidence, then maybe this isn’t a bad thing.