Archive for June 2008
A newly published report from Harvard Business Review reviews the theory of the ‘long tail’ with recent data. The report is here. This prompted me to think about the ‘long tail’ of the BI market.
First I think there are several complications with their data and findings:
- In some cases the top 10% of products represents 100,000 titles. Most brick-and-mortar stores can stock less than 10,000. For these stores the `blockbusters` referred to by the researches would be considered to be a long tail by the retailers. The explosion of the online inventory means that looking at the top 10% of products is not useful any more, its the comparion between the top 0.5% and the other 99.5% that needs to be examined. In many cases the revenue from the top 1% of titles was less than 20% of the total, with the 99% long tail taking in over 80%. I’d say thats significant.
- Some of the comparisons look at data from 2000 to 2005. That is a long time on the web. The consumer demographic has changed during that period. The products available have changed during that period. Buying habits have changed during that period. As a result those data sets are hard to compare at a summary level.
- Most importantly for the BI market the data studied is extremely skewed by ‘new releases’. Most of the data analyzed is from retailers of books, music, and movies. The purchase and rental of new releases in these markets is a very large proportion of the annual sales. Most industries, including the BI market, do not have this situation.
Lets look at the BI market from the geographical perspective.
- The remaining independent vendors in our space (for example Actuate and Information Builders) have offices in about 20 countries. The proprietary BI companies cannot be profitable in the other countries so they have no presence there and provide no offerings in those markets. While the mega-vendors like IBM, Oracle, and Microsoft have a presence in more countries that this, I doubt that they are able to offer implementation services for their BI products in all of them.
- Like Actuate and IBI, Pentaho (the company) is focusing sales efforts on the same obvious markets. We do, however have something to offer in other countries. At Pentaho we currently have registered community members from 155 countries. The countries typically covered by these proprietary companies account for, demographically, 50% or less of our community-base. Since people do not have to register in order to download the software the total usage is probably higher. While we do not get direct revenue from most of these countries we still get value from them: these community members provide use cases, bug reports, bug fixes, feature ideas, translations, documentation and documentation fixes, platform testing, scalability data points, usability feedback etc. These contributions enable us to produce better software, faster than we would otherwise be able to do. This better/sooner software is the basis of the subscription that we sell in the mainstream markets. Based on data from the CIA World Factbook the countries with known Pentaho community members account for over 90% of the world’s population.
People talk about ‘BI for the masses’ and ‘BI everywhere’. I think 155 countries is a good approximation to ‘everywhere’ at least compared to the proprietary vendors. It seems that open source business intelligence software is able to meet the needs of, and get value from, the long tail of the business intelligence market.
Dana Blankenhorn as ZDNet has posted a blog about Sun’s open source strategy
He seems to think that a publicly traded company such as Sun should allow product development to be driven organically by a community. In contrast the core development team of most open source projects don’t allow the project direction to be dictated by their community. Influenced? Yes (at their discretion). Directed? No.
My response to him is:
I totally agree that more transparency would be a good thing for Sun and its products and projects, and more openness would be the next big step. Beyond that I don’t agree with much that you wrote here.
The community demands a major say in the development direction or it won’t follow.
I don’t think this is true. The ‘community’ likes useful software and will contribute to the projects that provide it.
I agree that Java developers get frustrated with Sun and database developers get frustrated with MySQL. These (often vocal) developers represent 0.1% or less of the overall community. Surveys have shown that the majority of open source projects are run by a small (1-5) group of administrators/developers. Projects that have larger development teams are not large because it is easy to join the dev team, in fact they can be very hard to get in to. Try contributing to the Linux kernel or getting the Linux kernel team to bend to your will and see how far you get. Try directing an Apache project to do your bidding, its like trying to knit fog. Of course there are a few notable exceptions, but they are exceptions only. The core development teams of open source projects make the decisions about the direction, not their communities. The core team can decide to listen to, or ignore, the opinions of their non-development community. It is only when both the non-development community (often 1000s times bigger than the development community) becomes frustrated because they are repeatedly ignored, and the development community become fragmented, does the potential of a fork arise (a rare occurrence). Even in organic open source projects the non-development community can voice their opinions and objections but they do not direct the development of the project. The closest the community often gets is being able to vote on which features and bugs they think are most important.
The above being the case I don’t see why you want to hobble Sun’s strategy, product managers, and development teams by holding them to a standard that is barely even discernable among organic open source projects.
As you say there is no ‘I’ in team, but there is also no ‘$’ in team, and Sun ultimately has to answer to the market.
Matt Aslett at the 451 Group wrote a blog entry last week about the Beekeeper Model I developed.
He proposes that the analogy can be extended using the notion of ‘wild’ hives and blending to encompass commercial open source companies that are not of the single-vendor variety.
I’m glad he found the paper interesting. I think the analogy can be extended and applied as he proposes. In fact most commercial open source offerings are themselves blended. For example the JBoss, Pentaho, Alfresco, and Hyperic offerings include numerous Apache libraries so they are a mixture of ‘managed’ and ‘wild’ honey.
Maybe all this leads to alternatives to the ‘organic open source’ and ‘inorganic open source’ labels. I’m ok with the ‘organic’ term but the ‘non-organic’ doesn’t mean much to me. Maybe we should use ‘wild open source’ and ‘domestic open source’ instead?
Matt Asay blogs about a new Forrester report that shows a lower level of open source adoption than he thinks is realistic:
I think it all depends on who takes the survey.
If it is IT guys: they know that open source is being used all over the place and will give you the ‘truth’ about usage of open source.
If it is CIOs: they are often blind to the adoption of open source within their own company, Sun’s Schwartz has blogged about at least on example of this. I have also heard of a CIO who thought that his company was getting Tomcat from the ‘Apache Company’ and was surprised to learn that there was none. CIOs are also more risk adverse and less educated about open source.
It also depends on how you ask the question. If you ask me if expanding my investment in video games for the Wii is a priority for me my answer is no. If you ask me if I expect my investment in video games for the Wii to increase my answer is yes. Its not a priority for me, but I see it as inevitable.
From these results I would predict that the decision-makers taking this survey were mainly CIOs. If this is the case Forrester’s mistake is in surveying the weak link in the open source adoption chain. I say that CIOs are the weak link because they are less educated about open source than the IT community, they are largely unaware how deep and wide open source adoption already is within their organization. They think that they should be making decisions about the adoption of open source but don’t realize that they are too late. They need to be doing audits and putting governance in place. Otherwise the ‘C’ in CIO is more likely to mean ‘Canute’ than ‘Chief’.
Forrester’s report does highlight a perception issue that open source has amongst certain communities. This provides open source advocates a clear target to shoot at. Upon hearing about Bernard Golden’s upcoming report at OSCON on Open Source in the Enterprise someone asked me if I thought this was old news, generally accepted already, and not worth reporting on. Forrester’s survey show that open source advocates need more facts and reports at their disposal. I am looking forward to his report although none of the people who really need to hear it (CIOs) are likely to be at OSCON.
One of the projects on CodePlex, Sandcastle, has been un-published for not providing its source code. I don’t know what Sandcastle had published in the way of binaries etc so I’m not going to comment on the Sandcastle case. But after looking at CodePlex’s requirements for projects hosted there I have some questions.
These are the criteria that are listed on the CodePlex wiki for projects that are hosted there
What are the requirements for hosting a project on CodePlex?Your project must meet the following criteria:
- You must choose a license for your project (license resources: Open Source License page on Wikipedia)
- It must be an ongoing project (no “abandoned” projects)
- It must have source code (no non-software projects)
It is certainly good that there are not a lot of criteria. However the vagueness of these terms is problematic. Here are my thoughts:
1. You must choose a license for your project
The requirements state that their resource is a wikipedia page (http://en.wikipedia.org/wiki/Open_source_licenses). This is refreshing from the standpoint that it implies that CodePlex’s view of license seems to be ‘whatever wikipedia says today’. But what happens when the wikipedia page changes? If a license that was listed is dropped do any projects using that license get removed from CodePlex? If not what is to stop me from adding my own funky license to the wikipedia page, creating my project on CodePlex, and then using wikipedia revision history to prove my project’s right to be on CodePlex.
In reality what is on the wikipedia page does not seem to matter. The wikipedia page lists the GPL as a license but does not list specific versions) and the wikipedia page for the GPL lists all three versions of the GPL. So it would seem from this that GPL V3 is an allowed license for CodePlex. However these comments indicate that GPL V3 is not accepted.
weimingzhi wrote Nov 19 2007 at 11:33 PM
just to make sure… am I allowed to use the GPLv3 on this site?jwanagel wrote Dec 20 2007 at 1:05 AM
The site currently doesn’t have support for the GPLv3 license.
*jwanagel is a Senior Development Lead for CodePlex.com.
CodePlex needs to publish a clear list of allowable licenses in their FAQ.
2. It must be an ongoing project
The comment trail on the page verifies that ‘completed’ projects are ok, but not abandoned ones.
jwanagel wrote May 11 2007 at 7:31 PM
Abandoned means the project has no coordinators involved in the project, or there is no user activity (messages, issue tracker entries, downloads). A completed project is fine if there is still someone who will be the coordinator for the project on CodePlex.
Being able to identify dead and dying projects would be great. Sourceforge provides their ranking system which, despite some deficiencies, is generally indicative of the activity of a project. It would be nice if Sourceforge let you use that as criteria when doing searches.
Does jwanagel mean no messages or no tracker entries or downloads? Or do they mean no messages and no tracker entries and no downloads? By ‘no tracker entries’ do they mean no new entries or no edits either? What time period are we talking about? A week, a month or a year?
I’m not trying to be nit-picky but history and stewardship of some projects is rocky to say the least with long dormant periods and changes of administrators. I’d hate to see projects thrown off CodePlex that were ahead of their time and destined for greatness in a year or two.
If CodePlex is going to have a requirement such as this I think they need to clearly communicate the intent of the requirement. The current wording is insufficient. Without this we cannot gauge whether a given situation does or does not meet the spirit of their intent.
3. It must have source code
Their clarification in parenthesis says “no non-software projects”. What if a project is just starting and only has specifications and design documents so far? I suspect that CodePlex’s intent would be to let this project in but the requirements as stated seems not to. An example I use when talking about the benefits of peer review in open source projects is the pre-source-code days of the Eclipse BIRT project. The specification for BIRT was posted months before the seed code was released. Lots of people participated in disucssions about the specs and the specs improved significantly during the pre-seed period. Not allowing projects in the design and specification phase to be included on CodePlex would be detrimental but their current wording seems to prohibit this.
The basic question posed is:
‘Are we still in the “surfiet of great choices” stage, or (with enterprise interest and acceptance) are we swinging more towards standardization? Is fragmentation a real problem, or just part of the cycle? In other words: Which way is the open source market going?
Rod Johnson of SpringSource commented on this:
‘If there’s a new entrant into a category dominated by closed source vendors, developers (at least) typically welcome it. If there’s a new entrant into a category with at least one strong open source player, developers often react negatively, questioning why the new alternative is necessary.’
I believe fragmentation in open source will always follow a different pattern from the traditional one.
Most people recognize the fact that having a single open source option is not good, so a small number (2-5) of contenders is a good thing. You also have to accept that you need open source options for each standard and/or technology base. This leads to natural and healthy array of options.
Lets take workflow engines as an example. There are a number of current standards for defining workflows (BPEL, XPDL, Wf-XML, etc). Each of these formats has strengths and weaknesses and they appeal to different segments within the workflow domain. Then you need to consider that integrating a C++ work-flow engine into a Java application is not ideal. Developers can be very religious about the languages they use. From a community perspective an XPDL-based C++ workflow engine is not a competing alternative to a BPEL-based Java one. Assuming that at least 4 technologies are common (say C, C++, Java, PHP) we now have room for 24 different open source projects (3 standards x 4 technologies x 2+ options) before the space is even minimally populated let alone ‘fragmented’. While it is true that SOA and ESB make integrating different technologies easier but that does not mean than an IT department wants to maintain and develop in multiple languages.
If you only look at high-level functionality, open source does seem to be fragmented and have too many overlapping projects. But in a technology-led decision most of the open source options are filtered out at the start based on ‘fit’ or ‘suitability’ of the underlying technology. What is left is a reasonable short-list. I think this will always be the case with open source.
The question of how many commercial open source companies any domain will sustain is entirely different.
The open source analysts at the 451 Group, Raven Zachary, Matt Aslett, and Jay Lyman have started bi-weekly podcasts. They offer an interesting perspective on open source and commercial open source and I have enjoyed the ones they have made so far. They have a pragmatic and balanced opinion on most things. I like the fact that they are only 30 minutes long, the most recent FLOSS weekly is 1:24 long I haven’t found the time to listen to it yet.
There were a few comments made in the May 30th podcast I do not agree with completely. These comments could have been ‘slip of the tongue’ or not come out as they intended them to so I’m not holding it against them.
Jay Lyman (I think) said that ‘MySQL wants to widen its development community in order to have a larger opportunity to monetize’. To my mind this is the wrong model and not what MySQL is trying to do at all. The MySQL developer community is the one group of people who least need any product or service from MySQL.
Let’s say it became possible to physically copy a car and transport it anywhere in the world for near zero cost. Lets say your business model was to sell upgrade installations, oil changes, and tire rotations etc to people who took these ‘free’ cars. Your target market in this case is not auto-enthusiasts who have a garage full of tools and who love to tinker with engines. Your target market is people who do not have the time, the knowledge, or the inclination to get their hands dirty maintaining their car. Your target market is people who know little to nothing about cars and are happy to give you money so they can remain blissfully ignorant.
The market of users is far larger than the technically proficient population and each user is more likely to pay for a service. The other major issue is that the developers are individuals and the potential customers are organizations. Monetizing the developers directly is actually almost impossible. The developers don’t have any budget, but their manager, or manager’s manager might. In the last 10 years I personally have been monetized once for $40: I bought documentation for JFreeChart but never expensed it.
Raven Zachary (again I think) said that some legal teams have banned GPL for internal usage in companies but that this was an education issue that should resolve itself over time. The GPL is a viral license that was not created to meet the needs of businesses. The most popular dual license model amongst commercial open source vendors at the moment is the GPL/Commercial dual license. These licenses are chosen to provide options at either end of the spectrum: a very pro-open source option and a very pro-business option. The GPL is not deliberately anti-business but it is anti-intellectual-property. For certain usages in many businesses this equates to the same thing. In the absence of very careful governance and auditing corporate lawyers are taking steps to protect their companies from a credible and unquantified risk. I would say that governance and auditing and control is needed by these companies, not the education of their lawyers.