Measuring “white box” testing
An age old problem when testing something, is knowing when to stop. We’ve talked a lot about risk in previous entries, but some actual numbers are useful to keep the project managers happy. One measurement that is particularly popular for unit tests is code coverage. This is probably because you don’t care about whether the whole thing works, just the individual units, and you can count them easily enough. For example, the criteria for entry into the traditional test cycle, might include that the developers should have exercised at least 80% of the code. The theory here, is that with detailed knowledge of the code, the developer can exercise all the error paths which are tricky to get to unless you know exactly how. The trouble is, that with deadlines looming, measurements like code coverage can easily encourage people to test the easiest 80% - in this situation, you have a developer who has tested every “getter” and “setter” method that they can find, but haven’t ventured into the bit of code that only gets run when it’s a full moon.
Of course, it’s good to have some unit testing done before functional testing begins, but the purpose should surely be just to check that the whole thing isn’t going to fall apart as soon as somebody looks at it funny. Perhaps a better place to measure code coverage is after the black box tests have been run. If the developers could do some basic tests on the product, to at least prove that the product works sometimes, that would probably be good enough as a test entry measurement. Then, we could examine the amount of code that is covered by all the black box tests, which are trying to hit all the problems that customers might hit. Now, armed with the information about which bits of the code we know aren’t tested, and which we know are difficult to hit, we can start to use white box tests more effectively.
Of course, you still need to know when that first wave of “bring up” tests are done, so that the more formal testing can begin. This might be hard to quantify, so this approach does imply that you might not get any concrete test measurements until later in the cycle - of course, if the current measurement of unit test code coverage is actually providing false confidence, then maybe this isn’t a bad thing.

February 29th, 2008 at 16:36
As a developer, I’m going to have to disagree with you some here. The reason we unit test is not to get the code good enough to test. Its more of the reverse - A well thought through set of tests create an environment suitable to code.
There are several reasons:
* Provides immediate feedback to code changes. A full unit test suite for a large product should take about 10 minutes. individual test methods should run around a tenth of a second.
* Forces you to write maintainable code (you have to write thing in small chuck and think through dependencies and abstraction in order to write an effective unit test)
* Guards against regression. The specific logic paths that your code should take are continuously run every time a change to the code is made. This lets people other than the authors make changes with confidence.
* Documents your low level design (you should write and name your tests to described the desired behavior of your code. Google behavior driven development (BDD) for more on this.
Also, you can’t wait until function tests are done to apply unit tests. In my experience, if you wait until after the code is developed (and especially tested) to add unit tests, then it just simply won’t get done. There is no need and no pressure to add them. the next work item will always take precedence. You have to do unit tests while and even before you code.
You are right that relying coverage tools alone can be misleading. That’s why you have to use a variety of data points as quality indicators, such as code complexity, static analysis, peer reviews, and dependency analysis. A good tool for aggregating this data is xRadar (http://xradar.sourceforge.net/).
You are also right that unit test alone is not good enough to guarantee quality. Higher level component tests (testing larger units to ensure interactions between classes), acceptance/functional testing (validating overall flows work and are meeting the business requirements) are all vital pieces of the puzzle (as well as system test, and all the other varieties).
February 29th, 2008 at 17:14
I absolutely agree that there are a lot of additional benefits to unit testing. I was really trying to focus on how we measure when we’ve done enough. Of course, with a test-driven approach (where you write the test first), this is less of an issue, but not everyone does that. Either way, you often want a tangible way of measuring how much testing you’ve done, so that you can gauge the quality of the delivered code.
Your point about unit testing simply not getting done later in the cycle is a good one, but by making a conscious shift in project planning, and by having a good way of measuring the tests, I wonder if we could make it work.
I used the term “white box” testing as an alternative to “unit tests”, in an attempt to separate the two slightly. In an ideal world it might be possible to thoroughly unit test (and get the benefits you highlighted), and still have some time to focus on rounding out the code coverage later on with more white box tests. The trick is then deciding how we measure whether we’ve done enough in that first round of unit tests.
I also agree that additional metrics such as code complexity are useful. I haven’t seen xRadar before and will certainly take a look.
February 29th, 2008 at 19:03
I’ve also heard of teams evaluating stats like asserts per test and tests per method as ways of judging the adequacy of unit test cases.
Another aspect you can look at is coverage trends, especially if you are trying to build up test coverage on a legacy codebase. The principle is, as you develop or change code, the overall test coverage (at an overall, component, and package level) should never go down. If it doesn’t then you are probably not adding enough coverage for the new code you are writing.
I’ll also assert that I think its more important to do a review of unit tests than of code, especially if you have good unit test coverage and good metrics tools (static analysis, dependency analysis, etc) output.
Reviews would focus on:
* testing the proper scenarios
* finding missing scenarios
* performing adequate asserts
If one was thinking agile and lean, you could move the low-level designs to being unit test javadoc, so your unit test reviews would be equivalent to low level design reviews. This would eliminate a form of documentation and put the info in a place where it is easier kept up to date (if it gets out of date - theoretically the testcase will fail).