Posts Tagged ‘risk’

The Test Plan is dead?

Monday, March 4th, 2013

A friend just pointed me at an interesting article on Google’s Test analytics. It effectively replaces the existing ‘little used’ test plan with something more dynamic and representative of the outstanding risk in a project.

We have been doing some similar work around ‘Confidence maps‘ to  help demonstrate the quality of a deliverable.

Now we get into the discussion of the value of subjective v objective data. Our confidence maps use objective data such as test results, defect counts and code coverage as input, but ultimately the decision is made by a skilled professional who sifts the data and uses it for the quality assessment.

I think it helps to understand the difference between precise and accurate in this discussion.

Accuracy & Precision

In the fields of scienceengineeringindustry, and statistics, the accuracy[1] of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision[1] of a measurement system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.[2]

So things like defect numbers and test results are very precise, but they are not necessarily accurate measures of quality. (Think about a product that has 95% of its tests passing and 10 defects and another that has 50% passing and 1 defect – which has the higher quality?)
This is where we need the subjective analysis to determine quality. By taking the different bits of objective data (and the more we can get of this the better) we can make a subjective view of the products quality.

I like what Google are proposing – in fact I like anything that moves us away from building long winded test plans and the traditional test metrics of counting tests and defects – now we need to focus on how we turn precise data into accurate data. This is where the really skilled tester plays their part.

Bookmark and Share

Reduce Risk and Increase Confidence

Friday, February 22nd, 2013

Just seen a new link to this book “Explore It!: Reduce Risk and Increase Confidence with Exploratory Testing” by Elisabeth Hendrickson.

Haven’t read it myself, but there’s a lot that attracts me to it.

Firstly, Reducing Risk and Increasing Confidence – that to me is what testing is all about. We get bogged down with this thing about test’s jobs being about finding defects and ‘owning quality’ when the reality is testing is a tool for reducing risk. I posted here 4 years ago on this topic and I have seen teams make real progress in this area. We now talk about building confidence in our deliverables and are using ‘confidence maps’ to demonstrate this in a very visual way (we will post on this later).

Secondly, the exploratory bit is exciting. In Lisa Crispin’s ‘Agile Test Quadrant‘ this testing fits into Quadrant 3 – the testing that critiques the deliverable from a business perspective. This testing is hard – not technically hard, but hard because most software engineers and test specialists don’t have the business background to make this testing really successful.

 

I’ll need to get reading!

Bookmark and Share

Test your attitude to risk

Tuesday, May 11th, 2010

The Times has an interesting article on investors’ attitudes to financial risk tolerance. Data suggest that investors’ perceived and actual risk tolerances can markedly differ, affecting the suitability of their investment portfolios.

So why’s this relevant? Well, as testers we always make risk-based calls on what to test and to what extent. We therefore need to understand how our perception of our own risk tolerance maps to that of the business so that these decisions are in line with what the delivery, and business, needs.

For a short while The Times has teamed up with finametrica.com who provide an online risk tolerance questionnaire. It’s obviously geared towards financial risk, but it’s well worth a look to see how your own risk tolerance maps to that of the adult population.

Bookmark and Share

Targeted Testing

Tuesday, January 19th, 2010

The final production build of the software is ready. The last full run of testing starts. After two weeks, a potentially serious problem is found and the product manager decides it has to be fixed. The fix is rushed in, unit tested and built into a new production build.

Now, there are only two days left before the product must be shipped to customers. The three week final test phase now goes “out the window” and the test team do the best they can to verify that the product still works without any serious regressions.

This is an example of “Targeted Testing” being applied. Every test team has to do it at some stage, there simply isn’t enough time and resource to do an ideal job.  The effectiveness of that final test of the shipped product depends on the skills of all those involved in the decisions affecting every aspect of the life cycle of that last change – and probably a great deal of luck.

“What I need is a list of specific unknown problems we will encounter.”
(“Dilbertism” from Lykes Lines Shipping)

Time constraints like this are actually happening throughout the whole product development cycle. All the tests in the official plan may be run in several test phases, but by the end of each phase, the product has moved on, and in an ideal world, all those tests (and some not even dreamed of yet) need to be re-run.

Wouldn’t it be great if there were some tools and techniques to help do Targeted Testing so that at any time in the lifecycle of the product, we could know exactly which test is most likely to find a possible problem? All the available tests would be dynamically ordered based on this ever changing likelihood of finding problems. That way, we could be sure that the best possible testing was being done at any point in time in whatever time is available.

(more…)

Bookmark and Share

Snow risk

Friday, February 6th, 2009

Pickled onions and snow – not something you would usually put together (maybe for a snowman’s eyes?)
But all week, following the ‘worst snow storms for 20 years in the south of England’, I’ve been thinking about Rich’s post about Christmas (Did you remember the pickled onions?).

Every time I turn on the radio I hear someone complaining at the how unprepared we are for the few inches of snow that started falling Sunday night. Apparently in the olden days buses never stopped, schools never closed and snow was cleared within minutes of it falling, and in Canada and Scotland they have 27 words for snow and no one has ever slipped up or missed a train.

Now I know its a pain not being able to travel and so on, but what are the alternatives? Should we have disused warehouses along the south coast converted into networks of snow plough storage areas? Should we turn the Isle of Wight into a massive salt mountain? If it snows like this in another 20 years then we will be well prepared, but is it worth having fleets of snow ploughs sat idle in the interim? and how will the islanders feel?

I think it all comes back to risk and how poor people are about understanding what it means. People say they understand what it means to take a risk, but they mostly don’t like the consequences when a risk goes the wrong way.

For the last 20 years councils across the south have taken a risk that we won’t get a prolonged period of snow. For 19 of those years the risk has paid off and we, as tax payers, have saved a fortune. This year the gamble didn’t pay off and we are very quick to point blame at those dastardly politicians who made the wrong risk call.

Testers are all too familiar with this situation. Risk is great until it all goes wrong. Maybe the real problem is that the councils who are taking the risk aren’t publicizing it well enough? Its certainly something we don’t do well in test.

Anyway, I look forward to pictures in the local papers 15 years hence of unused snow ploughs gently rusting in the salty breeze coming in over the solent…..

Bookmark and Share

Risk v Confidence

Friday, January 30th, 2009

I’ve had a really exciting couple of weeks working with one of our System Test teams to define a better way of measuring test progress and product quality. For too long I’ve been fed up with the traditional test tracking metrics where we measure passes and fails or effort remaining. Historically, these measures seem to be used just because they are simple to gather. The assumption being that all you have to do is define what test cases need to be run, then track them until they all pass. The two major flaws in this are, firstly, that it’s a big assumption that the original test plan contains everything it needs to, and secondly, it is rare for any test plan to execute smoothly and at some stage in the project the project manager realises that the pass and fails aren’t telling them anything and start asking questions like “Just tell me what works and what doesn’t”. Invariably this is either impossible to determine or requires a lot of effort from the test team. At which point the simple solution is ‘Test team, work harder!’

I’ve failed miserably so far at trying to convince project teams that they should be looking at the outstanding risk in a project, rather than test case results. But I think I have finally realised why. People don’t like talking about risk. It sounds like something bad and most project teams don’t want to be associated with something bad.

The breakthrough we had this week came when my colleague Russell Finn came up with the idea of measuring the ‘confidence’ we have in the product or system rather than the outstanding risk. Now you could argue that confidence is just the inverse or risk in this case, but I think it has a much more positive spin on it.

We had been challenged by our lead engineer, Brian Cope, to redefine how we represented our status and with the help of system test leaders, Eileen Dreyer and Chris Osbourn we set about rethinking everything we do in terms of status reporting.

What we decided to show was effectively two columns of data. One showing areas of the product that we had high confidence in and one showing the backlog of areas we currently have low confidence in (or if you like the risky areas). Now, from a very simplistic view, we can answer the question ‘What works and what doesn’t?’ or at least have a good stab at it.

The next step was to work out a way of quantifying the ‘confidence’. Fortunately, this was relatively simple as we piggybacked on a piece of work that Russell had already done, where he had defined a ‘taxonomy’ for the system under test. This taxonomy split the system into its important parts, from a capability view point. With this taxonomy we were able to prioritise and apply relative weightings for each area using ‘Planning Poker’ (http://www.planningpoker.com/). A quick Friday afternoon game involving Jon Isaac, Russell, Brian and I and we had a pretty good view of the system with each area given a number of ‘story points’. (we have since done a sanity check with other members of our department and so far our estimates are holding up).

We could then chart the confidence in the system using a couple of pictures. The first showing the confidence in different areas (and their relative weighting), the second, showing the overall system. We decided to add a third ‘state’ to show areas of risk that we would be mitigating in the current iteration.

N.B The data shown here is for a fictitious system, but imagine that it is a system that is highly valued for it ability to recover from failures and outages and has a high expectation on performance.

Once we have this picture we can view automated test cases as tools that help us build our confidence in the system. Other tools included ‘manual testing’, ad-hoc testing, code reviews, code coverage metrics and ‘tester gut feel’. These other tools are not used in traditional tracking and can be a valuable source of information for determining the quality of the product. If these things feel a bit hokey now then spend a second or two thinking about what a traditional test status showing 54% pass actually means.

Riding on the back of another piece of work, where all the existing test cases had been ‘tagged’ to show which areas of the taxonomy they exercised. We held a review with the test team to weight each test case by area. Note that a test case can cover more than one area and would be weighted independently for each area. (For instance a test case might be very highly rated in the recovery area, but do a small amount of connectivity, this would mean that the test case would be weighted in both areas appropriately).

The following charts show the quality of the system during the early iterations. The highly weighted (and therefore most important areas) are being mitigated first (in true agile fashion) and we can see that a portion of recovery is now showing high confidence, a portion is being mitigated in this iteration and the rest is still outstanding in the backlog. Clearly the system is not suitable for shipping at this point.

As the iterations proceed we can see the backlog reduce and the confidence rise.

Finally we reach the last iteration and a decision must be made on whether we can ship or not. It looks like we have a small amount of risk in recovery, performance load and stress and a high confidence in everything else.

So, do we ship it or not?
The decision is still a tough one, but I’m sure that this sort of information will be far more useful than the traditional method where at this point we would be claiming 98% attempted and 94% successful!

I think this is a radical new way of thinking about product quality and will make a huge difference in how we do business.

I’d appreciate any thoughts and ideas on how this could be improved.

Bookmark and Share

Risk in other industries

Wednesday, January 21st, 2009

There was a very interesting programme on BBC2 on how risk is managed (or rather mismanaged) in the financial markets. The City Uncovered with Evan Davis is well worth watching – even if just to understand how human nature sacrifices risk in the pursuit of performance.

It’s well summed up Evan Davis’ closing statement: “If you think you’ve got risk licked – you haven’t”.

Bookmark and Share

Isn’t it all just risk?

Wednesday, January 14th, 2009

Over the last few days, Ben and I (this is a joint post) have been trying to reach agreement on our understanding of risk. Ultimately we want to identify some new and effective methods to articulate the risks we may identify. The discussions were held at lunchtime, and OK, things got a bit silly. Nevertheless, we think there is a lesson to be learned somewhere in the example below. As such, please do add a comment if you can find one… We’ve also submitted this as an idea to the BBC’s Genius programme.

Objective: Introduce consolidated risk gauge to simplify the (human : machine) interface. The merits of a such a device are illustrated in the motor-car example below.

Modern cars have a bewildering array of dials and warning lights on a dashboard – but are they really necessary?

Consider just one of these dials: the speedometer. Does a driver really care what his absolute speed, based on centuries old units and the period of the Earth’s orbit around the sun, is? The answer is no: a driver simply wishes to know he will get to his destination without incident: be it crashing, getting a speeding ticket or missing his appointment.

It is therefore proposed that the speedometer is replaced by a risk dial – which interprets prevailing driving conditions, speed limits and navigation plans – e.g., using existing GPS technology – to calculate a risk metric. For example, if the driver exceeds a speed limit on a road, the risk gauge will go up, as speeding tickets are more likely. The driver can then elect to change his driving style to reduce this risk.

This proposal can be extended further. Consider the fuel gauge. The driver does not care about how much is in the tank per se – he simply wishes to understand the risk of running out of fuel on his journey. This dial could therefore be replaced too, by one measuring this risk. Note that the speedometer and fuel gauges have been simplified to share a common unit – one of risk – and hence can share the same gauge! By extension of the same argument, all dials and warning lights can be incorporated into one single dial of “consolidated risk” – thus addressing the complexity of modern car dashboards.

Missed a car service? Risk increases. Parked in a dodgy area? Risk increases. Such a metric would help encourage drivers to minimise risk, and even find alternate transport methods. This conveniently brings the proposal onto its zenith: consider how much more pleasant and minimalist a Jumbo Jet’s cockpit would be if there was just a single risk gauge.

Bookmark and Share

Testing without limits

Thursday, March 6th, 2008

I started writing a comment to Ben Bakowski’s last post on “Stressful testing…“, but decided to turn it into a full post. I was thinking that a key difference between testing a “real world” object such as a car, and something like software, is that the limits aren’t always as clear. A car might have five seats, and might use roads where the speed limit is 70mph – these are concrete numbers, which allow you to make the distinction more easily. If I have a piece of software (especially a piece of middleware), then it’s tricky to know the limits. How many records can my database handle? How big can they be? How many users at once?

When testing middleware, I think you’re usually forced to ask yourself: what do I expect someone to do with this? It’s largely about guesswork and experience with existing customers. I think that an interesting idea to explore, is whether it is appropriate or advisable to set artificial limits on a product’s maximum load. Setting artificial limits could provide actual boundaries to the testing that needs to be done. I can think of a few “real world” analogies:

  • Putting a “maximum load” sticker on a lorry – ignore at your own risk.
  • Putting a fixed number of seats in the car – “we could fit eight seats in here, but we’ve only put in five, so only try and put five people in it”.
  • Fitting a speed limiter (i.e. a hard limit).

We could do the same in software: we could write out log messages when a particular level of load is exceeded; we could put in hard limits to the number of records in a database and the number of concurrent users. The key thing when doing this, I think, would be provide reasonable limits which won’t impede your customers, but at least “frame” your testing. For those die-hard customers who absolutely can’t do without that ten millionth database record, you could always offer a special option to remove the limiters, subject to additional testing, or acceptance of the additional risk.

What do you think?  Would this cause as many problems as it fixes?  Or would it enable you to focus on the testing that matters?

Bookmark and Share

If it’s not working, try something different!

Thursday, January 31st, 2008

If you are familiar with the BBC’s Little Britain comedy series, you will have probably seen David Walliams performing a Mr Mann sketch. Mr Mann is a very demanding customer, he has ludicrously specific requirements and no ability to compromise. Much to the annoyance of shop keeper Roy, Mr Mann is prepared to simply ‘wait’ – presumably forever – until he gets what he wants.

As Software Engineers, the very nature of our job means that we work with very specific sets of requirements. As software testers we have a tendency to blend these together into complex test scenarios with the sorts of prerequisites and demands that would make even the likes of Mr Mann seem easy to please. Sadly, when we don’t get all the pieces we need in one go, the temptation to wait for the rest of them to arrive can become overwhelming.

It is important not to forget that each and every test scenario is designed for good reason and will no doubt reveal something fundamental about the item under test. However, we must keep this end goal in context. If we can’t proceed with Scenario S as is, we must challenge ourselves and re-examine our test tactics to keep things moving. This is vital if we are to keep lowering the risk to the parent project.

OK, so this approach won’t always pay off, it’s a bit like trying to avoid a traffic jam – sometimes you just can’t. It is also important that we don’t become engrossed with spinning-plates or distracted with investigating the trivial low-value areas.

Sometime ago I had a track driving lesson, the instructor told me “Whatever you’re doing, it isn’t working. Try something different – anything will do!”. He certainly had a way of emphasizing this message, and I didn’t much appreciate the feedback at the time, but maybe he had a valuable point?

Bookmark and Share