James Dixon’s Blog

James Dixon’s thoughts on commercial open source and open source business intelligence

Quality Assurance

with 2 comments

This is the department where the largest difference exists between the COSS and proprietary models.

During my years of proprietary development I used to see claims about the quality of open source software but did not necessarily see why this should be the case. After spending a few years in open source I have come to believe in this completely and have some theories justify it.

When people talk about software quality they typically think primarily of bugs that are defects in the implementation of a feature. There are two other kinds of defects in software: requirements defects and design defects. Defects of these other types are often more damaging than implementation defects.

Proprietary software vendors use several methods to improve quality: regression testing, manual testing, beta programs, usability studies and acceptance testing. This work is managed and/or performed by internal Quality Assurance (QA) teams.

I will show that an open source model is more effective in achieving higher quality of requirements, design, and implementation than can be achieved with these methods.

First lets consider the fundamental differences in the overall processes followed by proprietary and open source models:

  • In proprietary software the vendor attempts to attain their desired quality level before the software reaches the consumer (customer) by using various resources and the techniques listed above. This is done in a period that occurs mainly after the software has been written and before it is released or made ‘Generally Available’ (GA). Once the software is ‘GA’ the number of people that have access to it is vastly increased and so the majority of the feedback from real users of the software occurs after GA.
  • In an open source model the ‘early and often’ and openness principles are used to put not only the software but also the requirements and the design in the hands of the consumer (community) as early as possible and to update it often. The feedback occurs early and often as well and the improvements are built into the process early on.

In my description of the proprietary process above I used the phrase ‘desired quality level’. This might seem odd, and so it should. After all the desired quality level, from a consumers perspective, ought to be ‘perfect’. Unfortunately for all concerned this desired quality level turns out to be ‘less than perfect’.

I remember the day I first learned this lesson. Our first start-up was our first foray into packaged, off-the-shelf software. Prior to that we had much more experience with service-focused and custom software than with shrink-wrapped software. We executed to a strategy we based on Moore’s Chasm Theory and were acquired by an established vendor as per our plan. During the due diligence performed by the acquiring company they reviewed our tools and procedures and it resulted in an interaction with their engineering director that I have never forgotten.

‘We examined the data in your defect tracking system and want to know where your bugs are’.
‘We fixed them.’ I replied – with considerable confusion.
‘Stop doing that,’ she said, ‘it takes too long, it’s too expensive, and you’ll never be competitive.’

How embarrassing! We were working under the delusion that bugs were bad and had been fixing them. What total amateurs we must be. It was not until I recently, after spending considerable time in an open source model, that I had much further insight into this. This was not an isolated, on-off, experience. This is the standard for proprietary software development.

The Cost of Quality – Proprietary Software

Lets examine the costs associated with achieving a given quality level with a proprietary development model. If we examine each cost and the factors affecting it some interesting conclusions can be drawn.

Before we get into the costs we need to understand the workings of the Quality Assurance team in a typical (at least in my experience) proprietary software company.

QA teams are part of a process that results in the release of software. In this process product managers interpret the requirements of prospective customers, actual customers, and sales and marketing departments, and document it for the software developers. The developers interpret the requirements as described by product managers and implement them. QA engineers then ask ‘how do I test this?’ and create artificial data to test the documented use cases of the feature. The best they can do is to ensure that the quality of the implementation is sufficient to use the feature as it was described by the product manager. They are not the ultimate consumer of the feature and so it is practically impossible for them to verify the quality of the requirements and the quality of the design. Their inability to effectively test the quality of the requirements and the design has several implications.

  • Requirement opportunity miss. It is frequently the case that a feature is design to solve a specific requirement and it turns out to be random how suitable that feature is for solving unforeseen requirements or use cases that vary slightly from the documented cases. I can recall many instances of meetings with product managers and account reps and sales reps to discuss a customer’s request that a newly released feature be modified so that it truly solves their need. This is typically logged as an enhancement request although in reality is usually a defect of the original feature requirements.
  • Blocked use case. Bugs that are not found prior to release are frequently associated with unforeseen use cases. Some of these bugs are obscure and harder to find but many of them are fairly glaring once the path to reproducing them is known. The result is that the customer is unable to use the feature to solve their specific problem because their use case was not anticipated and consequently not tested during the QA process.
  • Patching. The result of the issues above is that patches to the feature are required almost as soon as it has been released. This is very commonplace. Creating and releasing patches is an expensive and resource intensive exercise that lowers the productivity of product managers, developers and QA teams.

Don’t get me wrong, QA engineers are hard-working professionals, however the position they are put in by the software vendors is not enviable. Internal QA teams fight a losing battle in which there is a high price for failure and little glory for success. As a consequence it is often hard to motivate internal QA teams. The most common way for proprietary organizations to motivate people to test their software is to offer them financial incentives (salary, benefits, bonuses etc.). The organization is attempting to align the motivations of their staff with the motivations of the organization using money. As we will see below this is not nearly as effective as having real, lasting, direct alignment of motivation.

The costs involved in software quality can be broken down into different categories.

Cost of Finding Defects

The factors involved in finding defects are:

  • Finding defects is harder than finding them (in most cases).
  • Attempting to find all the defects without involving the consumer is very hard.
  • The fewer defects there are the harder they are to find.

The consumer and their usage of the software is an important factor in this cost. In order to find defects ‘pre-consumer’ a software vendor must attempt to predict the uses cases that all their customers have and then create the data and environment required to test those many use cases. Philosophically this is redundant work as the consumers already have the use cases, the environment, and the data. The vendor has to bear this redundant effort because they are attempting to protect the consumer from this process.

Proprietary software vendors use beta programs to mitigate these issues. Beta programs typically involve giving a limited number of customers access to the new software before it is released. Beta programs are not always highly effective and are often omitted because of this. Public beta programs are much more extensive than closed ones and can get better results but are not common because they require more time, more management, and are often undesirable from a licensing perspective.

The net result is that the cost of finding defects before the consumer receives the software goes up as the number of defects decreases because it takes longer and longer to find them.

Cost of Fixing Defects

Most software engineers will attest to these facts:

  • Implementation defects are often harder to find than to fix (see above).
  • Design and requirements bugs are harder to fix the longer they exist.
  • Overlapping defects that interact with each other are hard to diagnose and fix.
  • A defect is much easier to fix if a reliable reproduction path exists.
  • The fewer defects there are the easier they are to fix.

The cost of fixing a defect does not usually depend on whether it was found by a developer, QA engineer, or a consumer, as long as it can be reproduced in a controlled environment.
The overall result is that the cost of fixing defects decreases as the number of defects goes down.

Cost of Patching Software

If defects are found and fixed they then need to be provided to the consumers. Creating, testing, and releasing patches is not cheap. Obviously if a defect is found before the software is released a patch is not needed. This cost is a factor of the ability to find defects before release and is most heavily influenced by the methods used to find defects. In a proprietary model the overall result is that the cost of releasing patches decreases as the number of defects goes down.

Sales Cost

If it costs a lot of time and money to find and fix defects and provide high quality software this expense needs to be re-couped from the customers. An increase in the price of the software will result in lost sales – another cost. Budgets are constrained these days. Ask yourself this question: If there was a competitive product to Microsoft Office that had no additional features, but had no bugs, and cost 5 times more would you buy it? The answer is usually ‘no, I’ll put up with the bugs’ so the consumer is to blame for this one. Consumers of software frequently choose low price over high quality.

On the other end of the quality scale if software is of so low quality that either a demo does not work, or implementations are difficult, or many patches need to be issues then sales will be lost, legal issues may arise and customers won’t renew their maintenance agreements. This cost becomes dramatic as quality decreases below an acceptable level.

If we plot all of these costs for different quality levels we see an interesting result.

QA Cost for Proprietary Models

The vertical axis shows relative costs.

The top line represents the total cost of achieving a given quality level under a model which attempts to find and fix the defects ‘pre-consumer’. The lowest point of this line is the optimal (cheapest) quality level under this model. It is clearly apparent that this optimal quality level is not at the zero defect point.

The increasing sales cost as you get closer to a zero defect level is caused exclusively by the increased cost in finding the defects (not in fixing them). If the cost of finding defects did not increase as quality improves the sales cost would not increase.

Many people might have a reaction that the cost of perfect quality cannot really be the barrier that I present here. For those people I offer the following additional evidence.

Triage

When a customer reports a defect to a software vendor the case is discussed in a meeting that typically involves various product managers, engineers, and account managers. In this meeting a decision is made about when, and if, the defect will be fixed. The factors that are used to determine this include the nature and severity of the defect, which customer reported the issue (and how important that customer is to the vendor), whether a work-around exists for the defect, and whether other customers are also likely to encounter the defect. Often the defect tracking system stores severity and priority separately or stores a ‘customer severity’ or ‘customer priority’ to allow the vendor to set a severity or priority of their own. When placed in context of the chart above the implicit intent of this meeting is to decide whether the fixing of this defect moves the current quality level closer to, or further away from, the optimal level.

Shuttle Orbiter Flight Computer

As further proof lets examine the costs of a piece of software that has proven itself to be of very high quality: the software running on the flight computer of the Space Shuttle orbiter. There have been two tragic accidents in the long history of the Space Shuttle program and other lesser incidents but none of them has been caused by a failure of this software. This is a prime example of software that must be of the highest quality before it is used by the consumer (the crew).

To date there have been 21 releases of this software and less than 20 defects have eluded the QA process and been present during a mission. None of these defects has caused operational problems.

To my knowledge this software is approximately 500k lines of code. I am assuming that of those 500k lines of code maybe 10% of those are modified on an annual basis, presumably as other operational systems on the orbiter are upgraded. So now we are dealing with 50k lines of modified code a year. It takes 280 engineers and an annual budget of $350 million to assure the quality of these code changes. If the same level of quality had been applied to Windows XP it would have cost Microsoft over $200 billion to develop it. Microsoft has a $7 billion per year budget for R&D so it would have taken them over 25 years to develop Windows XP. This is clearly not a reasonable option for a commercial software vendor.

In 2002 the United States Department of Commerce’s National Institute of Standards and Technology estimated that software defects cost the US economy $60 billion a year (http://www.nist.gov/public_affairs/releases/n02-10.htm). This sounds like a lot of money until you consider that the cost of getting Microsoft Windows ™ alone to zero defects would cost the US economy significantly more than this. In this study they also estimated that earlier and more effective identification and removal of software defects would save the United States over $20 billion a year.

The Consumer Experience

For consumers the process of reporting defects and applying patches is largely negative. They have paid money for the software and so expect it (unreasonably as it turns out) to be free of defects. They are forced to report the defect and wait to hear when, or if, it will be fixed. They don’t like having to wait and they don’t like having to apply the patch (which might introduce other problems since it contains other changes not relevant to their problem).

Summary

I’m not trying to discredit the quality of proprietary software other than to say that the challenges of achieving very high quality levels are a natural consequence of their attempt to achieve that quality level prior to the consumer being involved.

The Cost of Quality – Open Source

Now lets consider these same costs under a pure open source model. As mentioned above in the open source model the requirements, design, and software are offered to a community of consumers (software developers, IT personnel, etc) in a usually transparent way. The design is made public when the first draft is ready and critique is welcomed. The software is made available as it is written.

Cost of Finding Defects

The ‘burden’ of finding defects falls to the community. After all, they have the use cases and the data. Since they are getting the software for free they tend to expect that it might have some defects in it. I have ‘burden’ in quotes because it is not really a burden at all. In the majority of cases they are going about their business as they normally would. If they encounter a defect of design or implementation it is in their interest to help the developers of the open source project fix it.

The community is self-motivated to participate in testing open source software.

Since the software is available as it is written it is possible for defects to be fixed and made available to the community very quickly.

It is a critical difference between these models that the consumer is testing the software in the process of trying to solve a real world problem. The consumer has a real use case that might not be quite what the designer intended, is running the software on a platform and configuration that is probably different from the designer, and has real data. The testers of the software are consumers going about their business. It is a consequence of their natural actions that leads to defects being reported. This is in stark contrast to the motivation and situation of an internal QA team.

With a reasonably successful open source project there will be thousands of consumers in the community. In a proprietary environment you will find a ratio of developers to testers thats ranges from 1:1 in very enlightened organizations to 5:1 in those less focused on quality. In an open source model the ratio maybe 1:1000 or considerably higher. You can think of it as ‘massively parallel testing’. It is true that these testers are not managed and directed to test the features that the developers think should be tested on any particular day. It is also true that many of those testers end up testing the same things that other testers do. It is an inefficient and redundant model if you look at it in these terms. But you need to factor in that these testers are not testing for the sake of testing. They are going about their job as they otherwise would and only if they hit a roadblock do they even become conscious of their role. Every community member with a unique use case and unique data (almost all of them) has a potential contribution to make.

An open source project with a small community will probably not have enough members to ensure that the software gets enough use cases and testing to attain a high quality level.

The members of an open source community will often provide peer review of a feature design before it has begun development. I have been both the donor and recipient in exchanges like this and it has been productive for both sides every time. By publishing a design spec to the community the designer often will get different types of feedback. Questions that start like this ‘That feature sounds great, how would I use it to do….’ are really clarifications of requirements that may introduce the designer to a new use case and to a defect in the original understanding of the requirements. Questions like this ‘I like the API, however I notice that I have to provide a File object in many cases which is tough since my input data is in a database’ highlight design defects. I have been involved with multiple projects where the quality of the design as it went into development was significantly superior than in its early drafts because feedback like this was incorporated.

By making the design available to the community prior to development design defects are identified much earlier than with a proprietary model.

What is the real cost of finding defects this way? The total cost (in terms of time) accumulated across all the participants is potentially very high. The average cost to each participant is very low and they have no way of collecting payment anyhow. The cost to the developer for this effort is very low. The real cost to the developers amounts to the cost of reviewing defects that are reported and collaborating with community members to get reproduction paths. If the quality is low there is likely to be a lot of duplicated effort on the part of the developers as they try to sort through all of the redundant incoming feedback. This cost therefore increases with lower quality.

There is often no schedule for the process of finding and fixing defects. The software is ready when its ready and not before. This can lengthen the software development process considerably. This process is a better reflection of reality. You cannot predict how long it is going to take to find and fix the bugs in a piece of software. In order to do that you would need to know what all the bugs are. In the proprietary model the main testing and fixing process has a set duration.

Cost of Fixing Defects

If you agree that many bugs are harder to find than to fix you will understand how the community helps to reduce this burden. Members of the community may also be sufficiently technical to be able to fix the bug themselves and contribute the fix to the open source project reducing the cost still further.
The advantages of this model in the early identification of requirements and design defects and the usage of real use-cases in the testing cannot be over-stressed.

Cost of Patching Software

This is where the ‘early and often’ principle starts to make a major difference. Since software and designs are made available as they are produced, identification of defects by the community can start immediately. If the developers heed that feedback and adjust their design or implementation a second feedback iteration can begin. This is part of the normal open source process and are often regular and planned and not done in a hurried ‘reaction’ mode. Open source projects however still do patches to fix defects and so there is a cost here.

Adoption Cost

There is no ‘sales cost’ to speak of in the open source model however there is a cost associated with a very low quality level as community members can simply abandon the project and adopt another one if the quality level is unacceptable. As the community diminishes the developers will have to bear the testing burden themselves.

QA Cost for Open Source

You can see from the chart that the optimal quality level under this model is theoretically zero defects. I say ‘theoretically’ because in reality, in general usage, if there is a defect that no-one encounters or does not cause anyone enough annoyance to cause them to report it the defect will probably remain unfixed. If we constrain ourselves to only considering defects that the consumers care about then the ‘zero’ defect target is attainable.

Here are some examples to back up these statements. In March 2006 Coverity, sponsored by the Department of Homeland Security, used their static code analysis tools to perform an analysis of a collection of the most widely used open source projects. Static analysis examines source code for suspect design patterns and known causes of memory leaks and security breaches etc. The results were made available on the web and within 7 days 900 defects had been fixed by the communities of the projects analyzed. Within two weeks the Samba (Linux-Windows network connectivity) community had fixed all 218 issues originally identified by the analysis: from a static code analysis perspective Samba (and several other project analyzed) achieved a zero-defect level. The rate at which these defects were fixed averaged less than 12 minutes per bug. To date a number of projects have reach 0 statically-identified defects, including Samba, OpenLDAP and Python. Over 5,600 bugs have been fixed across all projects. The fact that the average time to fix defects was 12 minutes supports two things:

  • That defects are easier to fix than to find
  • That communities enable massively parallel participation. I suspect that very few of the 5,600 defects were fixed in under 12 minutes however the number of defects being fixed in parallel by different community members brings the average time down dramatically.

This model without a doubt solves the problem of earlier and more effective defect resolution identified by the National Institute of Standards and Technology in their 2002 report.

The Consumer Experience

For consumers this overall process is much more positive than in the proprietary model. They have gotten something for nothing and so do not expect it to be perfect. It is in their best interest to report and defects they find and are typically happy to work with the developers or administrators to reproduce the defect and test the fix for it. There is also a gain in recognition and contribution of the community member when they find and report a bug. Paradoxically the expectation of lower quality is unfounded as the zero-defect level is realistically attainable under this model.

The Cost of Quality – Commercial Open Source Software

Now lets consider these same costs under a professional open source model. As you would expect this is a hybrid of the two models above but it is not ‘fixed’ at a specific point between them. The behavior of the COSS company determines how their model compares with the proprietary and open source models.

I have shown above that it is the principles of transparency and ‘early and often’ that give significant advantages to the open source model. It is natural therefore that largest factors in the software’s quality are commitments by the COSS company to the transparency and ‘early and often’ principles.

Cost of Finding Defects

To be any kind of COSS company at all they must have their source code (and usually compiled binaries) available for the community to download. This enables the community to use the software for their purposes and they will find bugs as a natural course of trying to use in in their particular situation.

The community needs to be able to communicate about any defects encountered so the COSS company needs to provide public forums and tools for reporting and tracking defects. Unless the COSS company does this implementation defects cannot be reported by the community and the benefits of the model are vastly reduced. This requires a commitment to transparency.

Making the source code available gives more technical members of the community the ability to comment on design defects but it us really the availability of roadmap and design documents as early as possible that enables this. This requires a commitment to the ‘early’ part of ‘early and often’.

The COSS company needs to refresh the source code and binaries on a regular basis. If this is not done the speed of the iterative feedback loop is greatly reduced. This requires a commitment to the ‘often’ part of ‘early and often’. Enabling this to happen requires using different tools and/or policies that are used by a proprietary company.

If the COSS company is committed to the transparency and ‘early and often’ principles of open source they will be able to have all the advantages that come from an open source model.

If the COSS company has any functionality that they only offer as part of their whole product to customers this software will be proprietary and will not have any of these advantages.

As mentioned above only those members of the community that are ‘in phase’ with the COSS company at that time are likely to contribute. A large enough community ensures that there will always be enough members in-phase at any one point. Making a longer window for release candidates increases the % of the community who will be in-phase.

Cost of Fixing Defects

This is the same as for the open source model except that it can be considerably quicker as the COSS company has full-time engineers whereas an open source project may have a collection of part-time engineers.

Cost of Patching Software

COSS companies do patches to fix defects and so there is a cost here.

Adoption and Sales Costs

Both these costs apply to COSS companies. If the open source model is used effectively for finding bugs then the sales costs at the low defect end of the scale does not exist and the only costs are associated with high defect levels.

QA Cost of COSS

The optimal chart for a COSS company follows the pattern of the open source model, not that of the proprietary model. This assumes a real commitment to the open source principles. If the COSS company does not have a sound commitment or does not implement the necessary infrastructure the chart above will follow the proprietary model.
In the COSS model the open source nature of the development process means that defects are found and fixed earlier but the existence of a schedule does limit the effectiveness of the community because that schedule might not align with their natural behavior during that period.

Written by James

May 29, 2009 at 9:56 pm

2 Responses

Subscribe to comments with RSS.

  1. Most testers find themselves outnumbered by devs. In my instance its almost 10 to 1. (The wanted proportion is a spent discussion Id like to stave off in this post.) Instead, I would like to bitch about a problem Ive found as I amass more projects to test. Assuming my ten devs are distributed between five projects (or app modules), each dev must attend to merely the Feature Review/Design group meetings for the project they are answerable for. Nevertheless, the tester must pay heed all five. Do you see a problem here? Lets do the mathematics for a 40 hour work workweek. If each projects Feature Review/Design meetings eat up eight hours per week, every dev will own 32 hours left to write code. Each tester is left with ZERO hours to run code! The preceding scenario is not that much of an exaggeration for my team. The tester has no choice but to cut some of these meetings just to wedge in a little testing. The tester is required to “stay in the know” about all projects (and how those projects integrate with each other), while the dev can often focus on a lone project. I think the foregoing problem is an oversight of many managers. I doubt it gets acknowledged because the testers’ time is being nickel and dimed away. Yet most testers and managers will tell you, Its a no-brainer! The tester should attend to all design inspections and feature walkthroughstesting should initiate as early as possible. I concur. But it is an irrational anticipation if you staff your team like this.

    Luther Sarkodie

    February 22, 2010 at 9:09 pm

  2. […] Designers, Developers, and QA aren’t the only stake holders.  In the end, it’s also the end users and how effective the product becomes.  When they are able to speak up in the beginning, the product I believe tends to be in a better quality state.  I am not the only one thinking this way, see James Dixon’s blog about open source quality assurance. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: