Jon Tilt's posts

Testing v Checking

Friday, January 15th, 2010

Had a great meeting this morning with a number of senior testers from Hursley.
The main theme of the discussion was around an article found by Russell Finn, ‘Testing v Checking‘ – where basically checking is doing what you are told and testing is doing what you feel should be done (verification v validation might be a better way of putting it).
We have evidence from one of our major products that 50% of our field reported problems are things no one had considered (not designed, coded or tested for). So whilst we are pretty good at the verification side of things, there’s a lot we miss becuase we are not testing to real stakeholder needs.
A solution we discussed was around exploratory testing where the emphasis is on learning about a system and improving it through testing.

We also discussed quality metrics and how they scale. Traditional methods of counting defects per 1000 lines of code are fundamentally flawed because they don’t consider the impact on the end user.  One defect in a 100 line application might sound good, but what if that code is distributed across a million devices and each one needs an update?

Perhaps a better method would be ‘mean time to defect’ – some sort of measure around the time it takes an average end user to encounter a new defect.

Final thoughts were around the combative nature of a tester raising defect. Each time we raise a defect we are effectively telling the coder their baby is ugly. Is there a better way of doing this to get testers and coders working cooperating more? One suggestion was to change the term defect to ‘opportunities for improvement’ – not ideal I know but I like the sentiment. Comment if you have any better ideas.

Negotiating for Quality software

Wednesday, January 6th, 2010

Just seen this article by Tim Rily, Director of QA at Mozilla. Negotiating for Quality Software. I very much like the pragmatic approach. We talk a lot about how testing fits into the agile way of delivering software, this article encapsulates how it can be done. The key is that everything is negotiable, engage your stakeholders to find out what is really important and use that to guide your testing.

TestingBlue – First year review

Monday, February 23rd, 2009

Our blog has now been running for just over a year and I thought it was about time for a bit of reflection.
The chart above, brought to you by the power of Google analytics show our first year stats (visits per month). After a good start in the year our enthusiasm waned and you can clearly see our readership dropping off between april and September.

Fortunately our change of strategy later in the year seems to have built up the interest again.

No accounting for taste

Friday, February 13th, 2009

I am now mere moments away from finally switching off my windows box and making the iMac our primary home machine.

The last hurdle was cleared this week when I found a replacement accounting package for the family treasurer to use and replace the copy of Quicken 2002 we have been running for many years on the PC.

The reason I took so long in finding a proper replacement was that I’d failed to listen to my stakeholder (my wife, Liz) on what her actually requirements were for the new package. I had investigated all sorts of shinny new software, from the free GNUCASH (that took me half a day to build) through to some fancy stuff from Igg Software called iBank. Everything I looked at seemed to do the the job (at least the trial versions allowed me to import the data) and looked good.

However, each time I ran the stakeholder demo I got the response ‘its too complicated’, or ‘I don’t want to learn a new package’, ‘I just want it to look and work like the old Quicken’.

Now Quicken isn’t the most fantastic piece of software, but it does a job, Liz knows how to use it and we’ve got several years worth of transactions in it. Its satisfied my stakeholder’s needs. So here I was trying to find something wizzy and new, when all Liz actually wanted was to have Quicken run on the iMac.

Unfortunately there isn’t a Mac version of Quicken (there was one, but they stopped it in the UK a while back and they’re promising one later this year), but in my searches I stumbled across an offering from Codeweaver called CrossOver Mac that claims it allows Windows programs (albeit a specially selected list) to run on a Mac. So last weekend I downloaded the trial version, installed Quicken 2002 and rebuilt all my transactions using a backup file and hey presto! we were in business.

After running the Mac and PC in parallel for a few days my stakeholder made the call to make the switch permanently. So we are now fully Mac’ed’ and I have a delighted treasurer.

Lesson learned, listen to what your stakeholder really wants, don’t assume you know better. Sometimes the simplest solutions are the best.

Now the only thing I have to crack is convincing my eldest that the Mac’s parental control really is a good thing!

Snow risk

Friday, February 6th, 2009

Pickled onions and snow – not something you would usually put together (maybe for a snowman’s eyes?)
But all week, following the ‘worst snow storms for 20 years in the south of England’, I’ve been thinking about Rich’s post about Christmas (Did you remember the pickled onions?).

Every time I turn on the radio I hear someone complaining at the how unprepared we are for the few inches of snow that started falling Sunday night. Apparently in the olden days buses never stopped, schools never closed and snow was cleared within minutes of it falling, and in Canada and Scotland they have 27 words for snow and no one has ever slipped up or missed a train.

Now I know its a pain not being able to travel and so on, but what are the alternatives? Should we have disused warehouses along the south coast converted into networks of snow plough storage areas? Should we turn the Isle of Wight into a massive salt mountain? If it snows like this in another 20 years then we will be well prepared, but is it worth having fleets of snow ploughs sat idle in the interim? and how will the islanders feel?

I think it all comes back to risk and how poor people are about understanding what it means. People say they understand what it means to take a risk, but they mostly don’t like the consequences when a risk goes the wrong way.

For the last 20 years councils across the south have taken a risk that we won’t get a prolonged period of snow. For 19 of those years the risk has paid off and we, as tax payers, have saved a fortune. This year the gamble didn’t pay off and we are very quick to point blame at those dastardly politicians who made the wrong risk call.

Testers are all too familiar with this situation. Risk is great until it all goes wrong. Maybe the real problem is that the councils who are taking the risk aren’t publicizing it well enough? Its certainly something we don’t do well in test.

Anyway, I look forward to pictures in the local papers 15 years hence of unused snow ploughs gently rusting in the salty breeze coming in over the solent…..

Risk v Confidence

Friday, January 30th, 2009

I’ve had a really exciting couple of weeks working with one of our System Test teams to define a better way of measuring test progress and product quality. For too long I’ve been fed up with the traditional test tracking metrics where we measure passes and fails or effort remaining. Historically, these measures seem to be used just because they are simple to gather. The assumption being that all you have to do is define what test cases need to be run, then track them until they all pass. The two major flaws in this are, firstly, that it’s a big assumption that the original test plan contains everything it needs to, and secondly, it is rare for any test plan to execute smoothly and at some stage in the project the project manager realises that the pass and fails aren’t telling them anything and start asking questions like “Just tell me what works and what doesn’t”. Invariably this is either impossible to determine or requires a lot of effort from the test team. At which point the simple solution is ‘Test team, work harder!’

I’ve failed miserably so far at trying to convince project teams that they should be looking at the outstanding risk in a project, rather than test case results. But I think I have finally realised why. People don’t like talking about risk. It sounds like something bad and most project teams don’t want to be associated with something bad.

The breakthrough we had this week came when my colleague Russell Finn came up with the idea of measuring the ‘confidence’ we have in the product or system rather than the outstanding risk. Now you could argue that confidence is just the inverse or risk in this case, but I think it has a much more positive spin on it.

We had been challenged by our lead engineer, Brian Cope, to redefine how we represented our status and with the help of system test leaders, Eileen Dreyer and Chris Osbourn we set about rethinking everything we do in terms of status reporting.

What we decided to show was effectively two columns of data. One showing areas of the product that we had high confidence in and one showing the backlog of areas we currently have low confidence in (or if you like the risky areas). Now, from a very simplistic view, we can answer the question ‘What works and what doesn’t?’ or at least have a good stab at it.

The next step was to work out a way of quantifying the ‘confidence’. Fortunately, this was relatively simple as we piggybacked on a piece of work that Russell had already done, where he had defined a ‘taxonomy’ for the system under test. This taxonomy split the system into its important parts, from a capability view point. With this taxonomy we were able to prioritise and apply relative weightings for each area using ‘Planning Poker’ (http://www.planningpoker.com/). A quick Friday afternoon game involving Jon Isaac, Russell, Brian and I and we had a pretty good view of the system with each area given a number of ‘story points’. (we have since done a sanity check with other members of our department and so far our estimates are holding up).

We could then chart the confidence in the system using a couple of pictures. The first showing the confidence in different areas (and their relative weighting), the second, showing the overall system. We decided to add a third ‘state’ to show areas of risk that we would be mitigating in the current iteration.

N.B The data shown here is for a fictitious system, but imagine that it is a system that is highly valued for it ability to recover from failures and outages and has a high expectation on performance.

Once we have this picture we can view automated test cases as tools that help us build our confidence in the system. Other tools included ‘manual testing’, ad-hoc testing, code reviews, code coverage metrics and ‘tester gut feel’. These other tools are not used in traditional tracking and can be a valuable source of information for determining the quality of the product. If these things feel a bit hokey now then spend a second or two thinking about what a traditional test status showing 54% pass actually means.

Riding on the back of another piece of work, where all the existing test cases had been ‘tagged’ to show which areas of the taxonomy they exercised. We held a review with the test team to weight each test case by area. Note that a test case can cover more than one area and would be weighted independently for each area. (For instance a test case might be very highly rated in the recovery area, but do a small amount of connectivity, this would mean that the test case would be weighted in both areas appropriately).

The following charts show the quality of the system during the early iterations. The highly weighted (and therefore most important areas) are being mitigated first (in true agile fashion) and we can see that a portion of recovery is now showing high confidence, a portion is being mitigated in this iteration and the rest is still outstanding in the backlog. Clearly the system is not suitable for shipping at this point.

As the iterations proceed we can see the backlog reduce and the confidence rise.

Finally we reach the last iteration and a decision must be made on whether we can ship or not. It looks like we have a small amount of risk in recovery, performance load and stress and a high confidence in everything else.

So, do we ship it or not?
The decision is still a tough one, but I’m sure that this sort of information will be far more useful than the traditional method where at this point we would be claiming 98% attempted and 94% successful!

I think this is a radical new way of thinking about product quality and will make a huge difference in how we do business.

I’d appreciate any thoughts and ideas on how this could be improved.

Dangerous coding errors revealed

Friday, January 23rd, 2009

I was sent this link this week by fellow Test Architect, Alasdair Paton.

Dangerous coding errors revealed

25 of the most dangerous bugs in software as defined by the US National Security Agency (NSA). The question Alasdair posed was how many of these had I found? I also wondered how many we actually go looking for?
I could see about 6 or 7 that we see regularly and actively look for.

Thoughts?

‘Consumability’ testing

Monday, January 12th, 2009

I finally got ‘Mac’ed up at the weekend and bought my first iMac. A thing of beauty! I’ve seen a lot of them recently, but have never set one up so thought it was the ideal opportunity to do some real  ‘out of the box’ ‘Consumability’ testing.

First up, the whole lot comes in one box that’s easy to carry out of the shop. I got home and opend it up and decided to get as far as I could without reading the manuals. First out of the box comes the wireless keyboad, then the power cable, then the machine itself. Lastly a long white box with some books the mouse and some sort of remote control.

So the machine goes onto the table, the power lead can only go one place and there’s only one power on button that I pressed. A few seconds later a screen comes up telling me to sort the mouse out – simple enough.
Then the next screen tells me to get the keyboard ready. Easy – except I put the batteries in the wrong way and had to refer to the booklet to check (actually on closer inspection the keyboard had a little diagram on it telling me which way they went – so lets put it down to a user error)

Once the keyboard was on everything went smoothly – it asked me to run a few tests to check it was connected properly, then off it went setting up the machine.

So my out of the box experience and consumability testing have scored pretty high. Next step the ‘11 year old test’. Could my daughter sync her phone up with the iMac using the bluetooth connection? Of course she could, it did take her a couple of minutes to work out which folder she need to connect to, but otherwise easy peazy!

Now onto some load testing……

‘Top Gear’ Performance testing

Wednesday, January 7th, 2009

Just been having a chat with a new grad who is working in one of our System Test teams.
We were talking about load, stress and performance testing and I was trying to articulate the reasons behind having a workload with peaks and troughs during the day (in the same way that a customer would) rather than our usual approach to load which is run at a constant high level for a long period. However, I was struggling with a justification as to why this might be a good idea and how it would find a different class of defects.

Then I remembered the ‘Top Gear’ episode when Hammond was trying to drive a Formula 1 car around a track. Apparently F1 cars are designed to go flat out – all the time – and are pretty good at this. What Hammond discovered to his horror was that when you don’t drive them flat out then you can get into all kinds of trouble.  Cornering was a particular problem. Flat out the tyres are warm and stick to the track, the brakes responsive and the down force of the car helps it round. Anything less than flat out and as Hammond found you are all over the place.

So in this case if your system is the car and you only use Lewis Hamilton to test it then there are a lot of defects that will get missed. Fortunately few people drive F1 cars slowly on the road, but this would be a severe problem if the system under test was something like a mini.

So moral of the day, remember for every Lewis Hamilton tester you have you need at least one Hammond.

Ubuntu convert

Friday, December 19th, 2008

After weeks of bullying from my pod mate Russell, followed by a great  two pronged sales job from Scott and Dan, I thought I should at least have a look at the new version  of Ubuntu.

So armed with Russ’ 8.10 CD I gave it a whirl, booting from the CD without installing it properly. I tried it first on my old T42 Thinkpad and it was up and running without any intervention. OK, I’m partially impressed.

Now the real test, would it cope with my 5 year old Evesham desktop?

Simple – it went on a treat, recognised all the bits (external harddrive, wired internet) – I’m now very impressed.

Plan is to buy a Mac in the new year and then turn the Windows box into Ubuntu. My daughter is not impressed as Linux doesn’t have MSN (at least that’s what I’m telling her – and will keep it like that as I think MSN has been responsible for 90% of the viruses on our box)

So count me as a new Ubuntu fan…..

Happy Christmas to both our reader. See you all again in 2009