Arthur Barr's posts

Your test is not a beautiful and unique snowflake

Friday, November 28th, 2008

I was hacking together a piece of Ruby code today.  The code was only a couple of hundred lines long, but I must have tested it 100 times. The test was as much of a hack as the code.  Code, test, code, test, code, test.  Fixed something?  Test it.  Added a (tiny) feature?  Change the test.  Why is it then, that as part of a non-hacking team of uber-professionals, it might it be *days* before the code gets tested properly?  In a waterfall process, it might be weeks.

Is it because when we make it someone’s sole job to do the testing, that the test itself becomes somehow “precious”, and not the throw-away thing that might be more useful?  Or is it just a result of working in a team and not on your own?

Testing without limits

Thursday, March 6th, 2008

I started writing a comment to Ben Bakowski’s last post on “Stressful testing…“, but decided to turn it into a full post. I was thinking that a key difference between testing a “real world” object such as a car, and something like software, is that the limits aren’t always as clear. A car might have five seats, and might use roads where the speed limit is 70mph – these are concrete numbers, which allow you to make the distinction more easily. If I have a piece of software (especially a piece of middleware), then it’s tricky to know the limits. How many records can my database handle? How big can they be? How many users at once?

When testing middleware, I think you’re usually forced to ask yourself: what do I expect someone to do with this? It’s largely about guesswork and experience with existing customers. I think that an interesting idea to explore, is whether it is appropriate or advisable to set artificial limits on a product’s maximum load. Setting artificial limits could provide actual boundaries to the testing that needs to be done. I can think of a few “real world” analogies:

  • Putting a “maximum load” sticker on a lorry – ignore at your own risk.
  • Putting a fixed number of seats in the car – “we could fit eight seats in here, but we’ve only put in five, so only try and put five people in it”.
  • Fitting a speed limiter (i.e. a hard limit).

We could do the same in software: we could write out log messages when a particular level of load is exceeded; we could put in hard limits to the number of records in a database and the number of concurrent users. The key thing when doing this, I think, would be provide reasonable limits which won’t impede your customers, but at least “frame” your testing. For those die-hard customers who absolutely can’t do without that ten millionth database record, you could always offer a special option to remove the limiters, subject to additional testing, or acceptance of the additional risk.

What do you think?  Would this cause as many problems as it fixes?  Or would it enable you to focus on the testing that matters?

Measuring “white box” testing

Thursday, February 21st, 2008

An age old problem when testing something, is knowing when to stop. We’ve talked a lot about risk in previous entries, but some actual numbers are useful to keep the project managers happy. One measurement that is particularly popular for unit tests is code coverage. This is probably because you don’t care about whether the whole thing works, just the individual units, and you can count them easily enough. For example, the criteria for entry into the traditional test cycle, might include that the developers should have exercised at least 80% of the code. The theory here, is that with detailed knowledge of the code, the developer can exercise all the error paths which are tricky to get to unless you know exactly how. The trouble is, that with deadlines looming, measurements like code coverage can easily encourage people to test the easiest 80% – in this situation, you have a developer who has tested every “getter” and “setter” method that they can find, but haven’t ventured into the bit of code that only gets run when it’s a full moon.

Of course, it’s good to have some unit testing done before functional testing begins, but the purpose should surely be just to check that the whole thing isn’t going to fall apart as soon as somebody looks at it funny. Perhaps a better place to measure code coverage is after the black box tests have been run. If the developers could do some basic tests on the product, to at least prove that the product works sometimes, that would probably be good enough as a test entry measurement. Then, we could examine the amount of code that is covered by all the black box tests, which are trying to hit all the problems that customers might hit. Now, armed with the information about which bits of the code we know aren’t tested, and which we know are difficult to hit, we can start to use white box tests more effectively.

Of course, you still need to know when that first wave of “bring up” tests are done, so that the more formal testing can begin. This might be hard to quantify, so this approach does imply that you might not get any concrete test measurements until later in the cycle – of course, if the current measurement of unit test code coverage is actually providing false confidence, then maybe this isn’t a bad thing.

Who is responsible for quality?

Tuesday, January 29th, 2008

If a team of people produce something, whose fault is it if the quality is poor? After all, a tester’s job is, ostensibly, to find defects in a product. A poor quality product must be the result of the testers not doing well enough, right? After all, test teams are often referred to as “Quality Assurance” teams. Well, I’d say that the testers aren’t the ones making the quality low, and everyone involved in the project is responsible for the quality, including designers, developers, testers and managers. By discovering a defect, a tester is merely saying that a problem exists, and is highlighting a risk. I think the question we should be asking, is who is responsible for the risk?

(more…)

Testing, testing, 123

Tuesday, January 22nd, 2008

Why do we test software? This may seem an obvious question: everyone has at some point experienced the frustration of error-prone, or “buggy”, software. As software becomes increasingly pervasive in modern life, simple frustrations can quickly become much more serious. Just as traditional engineering disciplines involve testing of generated artifacts to ensure fitness for purpose, safety and durability, the same requirements are true for software, even though software engineering may seem abstract to a traditional engineer.

We are a group of senior software testers working in the IBM Hursley software labs in the UK. We work on large middleware projects, across multiple testing disciplines. This is an unofficial collection of our thoughts and ideas on software testing, based around our philosophy of risk-based testing. We want to share our ideas, and expand them based on other people’s thoughts. We feel that a blog is an ideal format for this: please contribute.

Arthur, Ben, Jon, Richard and Scott