James Dixon’s Blog

James Dixon’s thoughts on commercial open source and open source business intelligence

Archive for the ‘commercial open source’ Category

SAS under pressure from Pentaho

with 22 comments

Computer Business Review recently reported an interview with SAS CEO Jim Goodnight in an article titled SAS CEO says CEP, open source and cloud have “limited” appeal

Firstly, welcome to the party, Jim.

This article is good news for Pentaho. Here’s why. For proprietary software companies the first rule of marketing against open source competitors is ‘don’t mention them unless you absolutely have to’. Microsoft discovered this over 12 years ago, and since then have not managed to create an effective anti-open source marketing campaign (read about the Halloween Documents here and here for more info). That’s with 10 years to think about it and over $10 billion to spend. So SAS must be under some pressure otherwise Jim Goodnight (the CEO no less) would be keeping his mouth shut. Hence the title of this post.

What this means is that SAS has moved from the Igorance phase to the Ridicule phase of battling open source, they only have Fighting and Losing to go. See Red Hat’s Truth Happens video for an entertaining description of the phases.

There are a couple of other interesting points in the article.

Jim Goodnight says of open source BI:

We haven’t noticed that (open source BI) a lot. Most of our companies need industrial strength software that has been tested, put through every possible scenario or failure to make sure everything works correctly.

He states that SAS does not come across companies like Pentaho in their part of the BI market. Interestingly he picks on quality and the extend of the testing environment, which is actually a strength of open source development over proprietary development. He implies that his customers have additional requirements that other BI consumers might not have. He seems to be saying that SAS has enterprise products and enterprise customers, leading us to conclude that open source BI is only a competitive threat in the small and medium sized business (SMB) market. There are a couple of problems with this for SAS. Firstly they lists their customers on their web site, and its very easy to cross-reference that list against the companies we know have downloaded and installed Pentaho’s software – a quick comparison shows the overlap is higher than 30%. Ouch. So much for SAS’s customers not having an interest in open source BI. You can check the Pentaho forums and see the activity of SAS customers integrating with Pentaho, and (oh no!) migrating from SAS to Pentaho. Google trends shows us how interested people are in Jim Goodnight’s company vs ours.

The reason SAS has not noticed open source BI is that those projects have gone off the radar and their sales reps don’t even hear about them. They don’t see because they’re not looking. Meanwhile, the SAS sales reps almost certainly have a deck of slides to position SAS against open source alternatives. If they are anything like the slide decks of the other proprietary BI vendors they are amusingly inaccurate FUD. Send me a copy if you have one, I’d enjoy them.

Also Jim Goodnight’s viewpoint of open source BI is interesting because it shows where the ‘Open Source Tide’ is. Here is the rising of the tide:

  • Operating systems vendors (Microsoft, 1990’s) say open source is for hobbyists.
  • A few years later database vendors say open source is fine for your OS but not for your databases
  • A few years after that middleware vendors say open source is fine for OS and databases, but not for middleware
  • A few years after that application vendors say open source is fine for the rest of the stack, but not for applications

At this point in the tide we are into sub-markets. Jim Goodnight is implying that open source is suitable for SMB BI but not SAS’s kind of BI. My favorite Open Source Tide article is a Forbes one titled A Fatal Flaw for Open Source. In this article it is essentially stated that open source is ok for everything except multi-tennent hosted applications. Wow. The tide is getting pretty high. Not much ground left to stand on. Of course the Forbes interview is with the CEO of a company that provides hosting services. The common theme here is that these CEO’s are telling us ‘open source is fine for everything except what we sell’. Predictable really. But they can never really tell you why – how can SAS tell us open source BI is not enterprise ready when they support Linux, MySQL, PostgreSQL etc?

Here is another thought. A few years ago, weren’t the established BI vendors saying their avenues for growth were going to be the SMB market and emerging markets? Now the CEO of SAS is saying he’s comfy staying in the enterprise zone. Hmm, so emerging markets is the fallback? SAS has customers in 45 countries. Pentaho has installations in over 180 countries. Good luck entering those geographies with expensive proprietary software, when open source BI is already the incumbent.

With open source presenting challenges in both SMB and emerging markets you might expect the growth of BI for the old guard companies to be rather flat – oh look

  • Business Objects (SAP): -0.2% market share growth last year
  • Hyperion (Oracle): 2.3% market share growth last year
  • SAS: 2.7% market share growth last year
  • MicroStrategy: -6.4% product license revenue growth Q3 2010 from Q3 2009
  • Pentaho is on target for 150% growth this year

One final comment: I’ve worked for proprietary BI companies including Hyperion. If you’re going to pick on open source, don’t pick on quality.

Written by James

November 1, 2010 at 6:53 pm

SaaS and Open Source?

with 2 comments

In a recent Forbes interview Treb Ryan, CEO of OpSource, somewhat bashes open source: http://www.forbes.com/2010/06/14/google-yahoo-software-technology-cio-network-open-source.html

Ryan makes some good points about the benefits of a multi-tenent architecture, but I feel he’s leaving out some important details.

Did OpSource write their own operating system, servers, middleware, and databases? They would be foolish to.
Did OpSource go with expensive proprietary software for thoses pieces? Probably not, with their business model they’d want to stay away from those license fees – and the OpSource website is RedHat Linux and Apache HTTP.

If they are smart, OpSource will, like all the other SaaS companies, use open source at every opportunity they can. And somehow this is a fatal flaw for open source?

Ryan is just doing a little open source bashing because it’s the thing that scares him the most. If SaaS companies can built multi-tenent apps on an open source base, then so can open source developers. He knows this. He’s just enjoying OpSource’s window of opportunity. But he joins a list of chief executives that have banded together over the years to tell a most amusing story. Bill Gates kicks it off in 2001:

“We think of Linux as a competitor in the student and hobbyist market, but I really don’t think in the commercial market we’ll see it in any significant way.”

2001: He was saying that open source is ok for a hobby, just not for your operating system.
2003: A few years after that, as open source databases started to appear, we heard the CEOs of database companies telling us that open source is ok for your operating system, but not for your database.
2005: Then the executives of middleware companies told us that open source was fine for your operating system and databases, but not for your middleware.
2008: After that we heard application companies telling us that open source is great for your operating system, databases, and middleware, but you don’t to use an open source application.
2010: Now Ryan is telling us that open source is fine for everything including applications, just not multi-tenent applications.

These CEO’s have painted themselves into a very small corner over the years. Looks to me like Ryan is the lastest one holding the brush. The question is who, if anyone, can he pass the brush to when multi-tenent open source applications appear?

Written by James

June 23, 2010 at 5:04 pm

Pentaho and IBM Hadoop Announcements

with 5 comments

Last week, on the same day, both Pentaho and IBM made announcements about Hadoop support. There are several interesting things about this:

  • IBM’s announcement is a validation of Hadoop’s functionality, scalability and maturity. Good news.
  • Hadoop, being Java, will run on AIX, and on IBM hardware. In fact, Hadoop hurts the big iron vendors. Hadoop also, to some extent competes with IBM’s existing database offerings. But their announcement was made by their professional services group, not by their hardware or AIX groups. For IBM this is a services play.
  • IBM announced their own distro of Hadoop. This requires a significant development, packaging, testing, and support investment for IBM. They are going ‘all in’, to use a poker term. The exact motivation behind this has yet to be revealed. They are offering their own tools and extensions to Hadoop, which is fair enough, but this is possible without providing their own full distro. Only time will show how they are maintaining their internal fork or branch of Hadoop and whether any generic code contributions make it out of Big Blue into the Hadoop projects.
  • IBM is making a play for Big Data, which, in conjunction with their cloud/grid initiatives, makes perfect sense. When it comes to cloud computing, the cost of renting hardware is gradually converging with the price of electricity. But with the rise of the cloud, an existing problem is compounded. Web-based applications generate a wealth of event-based data. This data is hard enough to analyze when you have it on-premise, and it quickly eclipses the size of the transactional data. When this data is generated in a cloud environment, the problem is worse: you don’t even have the data locally, and moving it will cost you. IBM is attempting a land-grab: cloud + Hadoop + IBM services (with or without IBM hardware, OS, and databases). They are recognizing the fact that running apps in the cloud and storing data in the cloud are easy: but analyzing that data is harder and therefore more valuable.

Pentaho’s announcement, was similar in some ways, different in others:

  • Like IBM, we recognize the needs and opportunities.
  • Technology-wise, Pentaho has a suite of tools, engines and products that are a much better suited for Hadoop integration, being pure Java and designed to be embedded
  • Pentaho has no plans to release our own distro of Hadoop. Any changes we make to Hadoop, Hive etc will be contributed to Apache
  • And lastly, but no less importantly, Pentaho announced first. 😉

When it comes to other players:

  • Microsoft is apparently making Hadoop ready for Azure, but is Hadoop currently is not recommended for production use on Windows. It will be interesting to see how these facts resolve themselves.
  • Oracle/Sun has the ability to read from the Hadoop file system and has a proprietary Map/Reduce capability, but no compelling Hadoop support yet. In direct conflict with the scale-out mentality of Hadoop, in a recent Wired interview Larry Ellison talked about Oracle’s new hardware

The machine costs more than $1 million, stands over 6 feet tall, is two feet wide and weighs a full ton. It is capable of storing vast quantities of data, allowing businesses to analyze information at lightening fast speeds or instantly process commercial transactions.

  • HP, Dell etc are probably picking up some business providing the commodity hardware for Hadoop installations, but don’t yet have a discernible vision.

Interesting times…

Written by James

May 27, 2010 at 3:33 am

Pentaho and Hadoop: Big Data + Big ETL + Big BI = Big Deal

with 2 comments

Earlier today Pentaho announced support for Hadoop – read about it here.

There are many reasons we are doing this:

  • Hadoop lacks graphical design tools – Pentaho provides plug-able design tools.
  • Hadoop is Java –  Pentaho’s technologies are Java.
  • Hadoop needs embedded ETL – Pentaho Data Integration is easy to embed.
  • Pentaho’s open source model enables us to provide technology with great price/performance.
  • Hadoop lacks visualization tools – Pentaho has those
  • Pentaho provides a full suite of ETL, Reporting, Dashboards, Slice ‘n’ Dice Analysis, and Predictive Analytics/Machine Learning

The thing is, taking all of these in combination, Pentaho is the only technology that satisfies all of these points.

You can see a few of the upcoming integration points in the demo video. The ones shown in the video are only a few of the many integration points we are going to deliver.

Most recently I’ve been working on integrating the Pentaho suite with the Hive database. This enables desktop and web-based reporting, integration with the Pentaho BI platform components, and integration with Pentaho Data Integration. Between these use cases, hundreds of different components and transformation steps can be combined in thousands of different ways with Hive data. I had to make some modifications to the Hive JDBC driver and we’ll be working with the Hive community to get these changes contributed. These changes are the minimal changes required to get some of the Pentaho technologies working with Hive. Currently the changes are in a local branch of the Hive codebase. More specifically they are a ‘SHort-term Rapid-Iteration Minimal Patch’ fork – a SHRIMP Fork.

Technically, I think the most interesting Hive-related feature so far is the ability to call an ETL process within a SQL statement (as a Hive UDF). This enables all kinds of complex processing and data manipulation within a Hive SQL statement.

There are many more Hadoop-related ETL and BI features and tools to come from Pentaho.  It’s gonna be a big summer.

Written by James

May 19, 2010 at 7:49 am

Pentaho Shines in Business Intelligence Market Study

leave a comment »

The results of The Wisdom of Crowds – Business Intelligence Market Study from Howard Dresner’s show Pentaho in an excellent light. Overall, Pentaho is rated 2nd out of all BI vendors, and 1st out of the vendors with an open source offering.

Richard Daley, Pentaho’s CEO has blogged about the results: http://blog.pentaho.com/2010/05/18/you-can-run-but-you-cant-hide/

Next Monday (May 24th) Howard Dresner will present the results of the study in a webinar.

Written by James

May 18, 2010 at 4:21 pm

Excellent book – Pentaho 3.2 Data Integration : Beginner’s Guide

leave a comment »

Packt Publishing has released their book on PDI – Pentaho 3.2 Data Integration : Beginner’s Guide

The book was written by María Carina Roldán, a valued member of the PDI community. She does a great job of fulfilling the promise of the book’s title – it is a great introduction and ‘getting started’ book for Pentaho Data Integration. I have worked with PDI for the last few years, so I’m hardly a beginner, but I still learned useful techniques and tips from the book.

The book starts with basic information about the tool and installation instructions, then, chapter by chapter, it introduces features and techniques that build upon each other. By the end of the book you’ll know how to perform all the standard ETL operations using Pentaho Data Integration and then some. There are tutorials and exercises throughout the book that really you understand the tool and its features.

If your organization uses Informatica, Ab Initio, IBM/Ascential Data Stage, Business Objects Data Integrator, or Cognos Decision Stream, this $45 book could lead to savings of hundreds of thousands of dollars with Pentaho Data Integration, and make you a hero in the process. If your organization can’t afford those products, this $45 book could lead to solutions at a fraction of their cost, and make you a hero in the process.

Almost everything in the book applies just as well to PDI V4.0 as it does to V3.2

Disclosure: I was one of the reviewers of the book during its writing so I might be a little biased – but I was not paid for that effort, nor am I being paid to review the finished book.

Written by James

May 7, 2010 at 8:56 pm

Gartner and Intelligent Enterprise on the costs of open source BI systems

with one comment

A good article by Doug Henschen (I got it right this time, Doug) reviewing part of a presentation from Gartner’s BI Summit last week. Here is Doug’s first statement:

The five-year cost of a typical, 500-seat BI deployment ranges from about $150,000 for an open-source system to just over $1 million for the full suite from SAP BusinessObjects, IBM Cognos or Oracle. In between these two extremes are Microsoft, pure-play BI vendors ($522,000 to $674,000) and software-as-a-service (SaaS) BI vendors ($582,000).

The full article is here: http://intelligent-enterprise.informationweek.com/blog/archives/2010/04/gartners_take_o.html

Written by James

April 22, 2010 at 3:09 pm

Free/Open Source Software Global Maturity Matrix (FOSS GloMM)

with 2 comments

I have stated a few times that the Free Software Foundation (FSF) and its advocates don’t have a vision of the future that I find viable. I have read statements that the best custodians of FOSS are tiny consulting companies, and that Microsoft and Oracle should be barred from participating in FOSS. I don’t see how the software needs of the world can be met by tiny services companies, or how we can magically make the existing market players disappear. But I can’t complain about their vision without providing any vision of my own. So here it is.

The Vision

I subscribe to the theory that a vision is a dream + a plan.

The Dream

Twenty years from now, across the globe, every individual, business, organization, and government entity will have FOSS suitable for all their needs. That is not to say there there is no proprietary software any more – their certainly will be for the next 20 years – just that any and all normal requirements can be met with FOSS.

In this future the notion of intellectual property will still exist, as will software patents (unfortunately). In this future, any software or services company, of any size, whether local or global has the opportunity to participate in the FOSS realm.

We will reach this goal incrementally via an evolution of FOSS software, an evolution of the existing market players, and the creation of new market players.

The Plan

1 – Establish Metrics

Here is my proposal for assessing the state of the dream. By country, we score each software domain in terms of how well FOSS provides suitable solutions that are:

  • OSI approved.
  • Localized.
  • Compliant with all local regulations (accessibility, domain-specific legal requirements etc).
  • Stable, usable, and documented.
  • Available on multiple platforms (at least 2).
  • Available from 3 separate projects (different code-bases), failing that 3 different distros.

Any software under an OSI license is eligible for inclusion – no matter the size of the project, or the business model of provider.

We also score the software with regards to how well it supports all the needs (including support, training, and professional services) of:

  • Micro organizations (1-9 people)
  • Small organizations (10-99 people)
  • Medium organizations (100-250 people)
  • Large organizations (> 250 people)

The domains assessed could be (I’m sure there are many more we can add):

Operating System and Middleware
1.    Operating System (OS)
2.    Database (RDBMS)
3.    HTTP and Application Servers
4.    Network Management and Monitoring
5.    Enterprise service bus (ESB), message queue (MQ)
6.    Email
7.    Instant messaging
8.    Calendaring
Horizontal Applications
9.    Customer Relationship Management (CRM)
10.    Locally-compliant Enterprise Resource Planning (ERP)
11.    Content Management Systems (CMS), knowledge base
12.    Call center, case tracking
13.    Ecommerce
14.    Online meeting and conferencing
15.    Voice over IP (VoiP)
16.    Collaboration – forums, wiki etc
17.    Reporting, analysis and Business Intelligence (BI)
18.    Online training
19.    Financial, Budgeting and Planning, including public sector
20.    Distribution
Desktop Applications
21.    Word processing
22.    Spreadsheet
23.    Presentation
24.    Graphics editors
25.    Printing tools
26.    Software and web tools (compilers, editors etc)
Vertical Applications
27.    Healthcare
28.    Education
29.    Government
30.    Agriculture
31.    Insurance
32.    Retail, including Point of Sale
33.    Telecoms
34.    Petrochemical
35.    Pharmaceutical
36.    Travel and hospitality
37.    Engineering, Manufacturing, Construction
38.    Textiles
Obviously, within each of these vertical domains there are multiple applications. Scoring here will be tricky.

System Integrators
39. The local availability of systems integrators that can implement FOSS stacks and solutions.

Scoring is done per country and per domain and is scored from 0 to 9. A score of 0 indicates there is no FOSS option for that domain and geography. A score of 9 indicates the existence of three different FOSS options that meet the needs of large organizations. We can color code by range, red=2 or less, yellow=3 to 6, green=7 or more

2 – Census

We need to find out how close we are to achieving the dream. Volunteer organizations and sponsoring organizations score each domain for a single country, providing notes about the FOSS packages assessed and any services options assessed. The results of the census are publicly available at all times. Academia and analysts could provide much of this data.

3 – Close Gaps

Based on the results of the census, sponsoring organizations provide resources and guidance to help close the gaps. Sponsoring organizations will have many different motivations:

  • Wanting a larger local and global market for their services and support offerings.
  • Wanting software that is more accessible.
  • Wanting more FOSS options in their country or domain.
  • Wanting to sell add-ons and extensions.

Or we just allow the natural progress of FOSS to gradually populate the GloMM in a natural  – ‘Game of Life’ / Brownian motion – kind of a way.

4 – Repeat Steps 2 and 3

As time goes by and we repeat steps 2 and 3. The Matrix flushes out gradually, and becomes greener and greener.

Until…

5 – Declare Victory

In my opinion FOSS has won when, and only when, the entire sheet (7000-8000 cells) is lit up in green. At this point the value of FOSS will be clear to everyone. Maybe attitudes towards intellectual property will change then. But we can’t expect them to change before we get close to this point.

Summary

As this gradual global evolution occurs, the existing market players will have to adapt to new market conditions. What they do, and how well they do it, is up to them – but they are welcome to participate. Just because Oracle is now the ultimate custodian of MySQL, does not mean that MySQL should not be listed as one of the FOSS databases. Microsoft, IBM, SAP, Oracle should be accepted into this evolution – whether they survive it is up to them and the global and local markets, not up to anything else.

Any organization that produces FOSS, or localizations, or documentation, or provides services or support for FOSS is deemed to be a friend of GloMM – no matter what their size, history, or business model.

So that’s my vision. I have a defined goal, a way of measuring progress, mechanisms for getting there, and ways for existing market players to participate. I claim it to be reasonable, rational, and viable.

I have no resources at my disposal to execute on any of this. Its just a vision. If only I had a dream + a plan + resources 🙂

Written by James

April 21, 2010 at 1:37 pm

Matt Aslett on the Nuxeo vs Pentaho debate

leave a comment »

Mat Aslett at the 451 Group has weighed in on the recent postings and comments between myself and the folks at Nuxeo, in a post – Let he who is without proprietary features cast the first stone

Matt raises some good points in a well balanced approach. Matt comes down on the side of Nuxeo not being open core, but not pure play either:

My own feeling is that Nuxeo’s approach is not open core, since the original definition of open core concerned proprietary products. However, the existence of Nuxeo Studio means that Nuxeo is clearly not 100% open source.

But he doesn’t have a term for what their model is, and is proposing adding a new one to their list:

For that reason, I have come to believe that we need to add a new revenue trigger category to our open source business strategy model, that makes a clear distinction between support subscriptions for 100% open source code, and value-add subscriptions that offer additional hosted services.

As yet, no-one is able to answer my question – if Pentaho stopped offering on-premise deployments and only provided a SaaS offering, would we no longer be open core? Maybe the answer is this new category that Matt mentions.

It looks to me like this model is close to an open core model – just hosted, not on-premise. But the term ‘open core’ does not specify how the non-open parts are deployed, just that some code is open source and some is not. So should we just create sub-categories of open core?

  • Open core (on-premise), and open core (hosted)
  • Local open core, and remote open core?

Written by James

April 8, 2010 at 6:08 pm

Pentaho listed as a top 10 open source business application

with one comment

Runar Lie, founder of Office123, posted his list of the top 10 open source applications for businesses. Nice to see Pentaho on the list.

https://www.office123.net/uhelp/index.php/blog/2010/01/31/8-the-top-10-open-source-business-applications?lang=en

Written by James

April 8, 2010 at 5:32 pm