James Dixon’s Blog

James Dixon’s thoughts on commercial open source and open source business intelligence

Archive for May 2010

Apple App Store vs FSF

with 5 comments

There was a post a few days ago on the FSF site about the GNU Go app situation

http://www.fsf.org/news/blogs/licensing/more-about-the-app-store-gpl-enforcement

Basically the GNU Go app is GPL, which is not compatible with the terms of service of Apple’s App Store. This is the wrap-up on the FSF post:

That’s the problem in a nutshell: Apple’s Terms of Service impose restrictive limits on use and distribution for any software distributed through the App Store, and the GPL doesn’t allow that. This specific case involves other issues, but this is the one that’s most unique and deserves explanation.

We would’ve liked to see Apple do the right thing and remove these limits, but it looks like that’s not going to happen. Apple has removed GNU Go from the App Store, continuing their longstanding habit of preventing users from doing anything that Apple doesn’t want them to do. As we said in our initial announcement, this is disappointing but unsurprising; Apple made this choice a long time ago. We just need to make sure everybody else gets the message: if you value your independence and creativity, you should be aware that Apple doesn’t. Take your computing elsewhere.

I am a firm believer in the FOSS, but this is nonsense from the FSF. This position is micro-focused and blinded to the larger picture. The App store provides both free (zero-cost) and paid downloads. App developers are able to provide open source or proprietary apps. Developers have the freedom to choose. What the FSF wants Apple to do is to remove the mechanisms in place that protect the distribution of proprietary apps. My guess is that the proprietary apps account for over 98% of all iPhone and iPad apps.The FSF wants Apple to abandon those developers (>98%) in favor of the developers who believe in the Free Software philosophy (<2%).

The FSF’s position is that, by rights, software is fundamentally free and that intellectual property is bad. The FSF doesn’t like the fact that people create proprietary software. They don’t even like shareware. They think anyone creating software should give it away for free and provide the source code too. The FSF values your creativity – they value it so much they think you don’t have the rights to own your creation.

Written by James

May 28, 2010 at 5:33 pm

Posted in Apple, licenses, open source

Pentaho and IBM Hadoop Announcements

with 5 comments

Last week, on the same day, both Pentaho and IBM made announcements about Hadoop support. There are several interesting things about this:

  • IBM’s announcement is a validation of Hadoop’s functionality, scalability and maturity. Good news.
  • Hadoop, being Java, will run on AIX, and on IBM hardware. In fact, Hadoop hurts the big iron vendors. Hadoop also, to some extent competes with IBM’s existing database offerings. But their announcement was made by their professional services group, not by their hardware or AIX groups. For IBM this is a services play.
  • IBM announced their own distro of Hadoop. This requires a significant development, packaging, testing, and support investment for IBM. They are going ‘all in’, to use a poker term. The exact motivation behind this has yet to be revealed. They are offering their own tools and extensions to Hadoop, which is fair enough, but this is possible without providing their own full distro. Only time will show how they are maintaining their internal fork or branch of Hadoop and whether any generic code contributions make it out of Big Blue into the Hadoop projects.
  • IBM is making a play for Big Data, which, in conjunction with their cloud/grid initiatives, makes perfect sense. When it comes to cloud computing, the cost of renting hardware is gradually converging with the price of electricity. But with the rise of the cloud, an existing problem is compounded. Web-based applications generate a wealth of event-based data. This data is hard enough to analyze when you have it on-premise, and it quickly eclipses the size of the transactional data. When this data is generated in a cloud environment, the problem is worse: you don’t even have the data locally, and moving it will cost you. IBM is attempting a land-grab: cloud + Hadoop + IBM services (with or without IBM hardware, OS, and databases). They are recognizing the fact that running apps in the cloud and storing data in the cloud are easy: but analyzing that data is harder and therefore more valuable.

Pentaho’s announcement, was similar in some ways, different in others:

  • Like IBM, we recognize the needs and opportunities.
  • Technology-wise, Pentaho has a suite of tools, engines and products that are a much better suited for Hadoop integration, being pure Java and designed to be embedded
  • Pentaho has no plans to release our own distro of Hadoop. Any changes we make to Hadoop, Hive etc will be contributed to Apache
  • And lastly, but no less importantly, Pentaho announced first. 😉

When it comes to other players:

  • Microsoft is apparently making Hadoop ready for Azure, but is Hadoop currently is not recommended for production use on Windows. It will be interesting to see how these facts resolve themselves.
  • Oracle/Sun has the ability to read from the Hadoop file system and has a proprietary Map/Reduce capability, but no compelling Hadoop support yet. In direct conflict with the scale-out mentality of Hadoop, in a recent Wired interview Larry Ellison talked about Oracle’s new hardware

The machine costs more than $1 million, stands over 6 feet tall, is two feet wide and weighs a full ton. It is capable of storing vast quantities of data, allowing businesses to analyze information at lightening fast speeds or instantly process commercial transactions.

  • HP, Dell etc are probably picking up some business providing the commodity hardware for Hadoop installations, but don’t yet have a discernible vision.

Interesting times…

Written by James

May 27, 2010 at 3:33 am

Deploying a web server on a smart card

leave a comment »

Apparently you can run a Java HTTP servlet engine on a smart card: http://java.sun.com/developer/technicalArticles/javacard/javacard-servlets/

Certainly its an interesting capability. I just can’t think of anything useful I could do with it.

Written by James

May 21, 2010 at 4:45 pm

Posted in Uncategorized

EMC’s Dan Hushon on Pentaho and Hadoop

leave a comment »

Dan Hushon, a Senior Director at EMC’s CTO office, has blogged about our Hadoop announcement: ETL & Hadoop/Map-Reduce… a match made in Orlando!

Dan has been at EMC for a number of years and know a lot about data. He is dead on when he talks about metadata and dimensionality of Map/Reduce and NoSQL data stores. These environments are rich in data but the metadata can be very sparse or non-existent. This makes reporting and analysis of the data harder.

Written by James

May 20, 2010 at 4:04 am

What Agile BI is not…

leave a comment »

I have a late entry for Pentaho’s ‘What is Agile BI?’ competition.

Agile BI is everything that is wrong with this: ‘Baker Hughes Deploys SAP BusinessObjects Solutions in Less Than One Year’

This is the title of a session that was run yesterday at the SAP/SapphireNow conference in Orlando. That’s old-school, right there.

Thanks to @markmadsen and @wherescape on Twitter for giving me the twit-up on this.

Written by James

May 19, 2010 at 5:28 pm

Pentaho and Hadoop: Big Data + Big ETL + Big BI = Big Deal

with 2 comments

Earlier today Pentaho announced support for Hadoop – read about it here.

There are many reasons we are doing this:

  • Hadoop lacks graphical design tools – Pentaho provides plug-able design tools.
  • Hadoop is Java –  Pentaho’s technologies are Java.
  • Hadoop needs embedded ETL – Pentaho Data Integration is easy to embed.
  • Pentaho’s open source model enables us to provide technology with great price/performance.
  • Hadoop lacks visualization tools – Pentaho has those
  • Pentaho provides a full suite of ETL, Reporting, Dashboards, Slice ‘n’ Dice Analysis, and Predictive Analytics/Machine Learning

The thing is, taking all of these in combination, Pentaho is the only technology that satisfies all of these points.

You can see a few of the upcoming integration points in the demo video. The ones shown in the video are only a few of the many integration points we are going to deliver.

Most recently I’ve been working on integrating the Pentaho suite with the Hive database. This enables desktop and web-based reporting, integration with the Pentaho BI platform components, and integration with Pentaho Data Integration. Between these use cases, hundreds of different components and transformation steps can be combined in thousands of different ways with Hive data. I had to make some modifications to the Hive JDBC driver and we’ll be working with the Hive community to get these changes contributed. These changes are the minimal changes required to get some of the Pentaho technologies working with Hive. Currently the changes are in a local branch of the Hive codebase. More specifically they are a ‘SHort-term Rapid-Iteration Minimal Patch’ fork – a SHRIMP Fork.

Technically, I think the most interesting Hive-related feature so far is the ability to call an ETL process within a SQL statement (as a Hive UDF). This enables all kinds of complex processing and data manipulation within a Hive SQL statement.

There are many more Hadoop-related ETL and BI features and tools to come from Pentaho.  It’s gonna be a big summer.

Written by James

May 19, 2010 at 7:49 am

Pentaho Shines in Business Intelligence Market Study

leave a comment »

The results of The Wisdom of Crowds – Business Intelligence Market Study from Howard Dresner’s show Pentaho in an excellent light. Overall, Pentaho is rated 2nd out of all BI vendors, and 1st out of the vendors with an open source offering.

Richard Daley, Pentaho’s CEO has blogged about the results: http://blog.pentaho.com/2010/05/18/you-can-run-but-you-cant-hide/

Next Monday (May 24th) Howard Dresner will present the results of the study in a webinar.

Written by James

May 18, 2010 at 4:21 pm

Book Review: IT’s Hidden Face

leave a comment »

IT’s hidden face: Everything you always wanted to know about Information Technology. A look behind the scenes
By Claude Roeltgen

Available at Amazon.com

The author sent me a copy of this book for me to review some time ago. I have read it twice now.

Over his long career the author, a CIO, has clearly learned many things about IT, and taken hard knocks along the way. In this book he sets out to describe, to a non-IT audience, why IT is so complicated and costly. He succeeds admirably. The book progresses logically and clearly with many anecdotes and personal stories to illustrate the points being made. The overall affect is compelling and comprehensive.

This is an excellent book for all C-level executives, and great for all MBA students. Given the C-Level audience the author might benefit from releasing a condensed version for those with limited time on their hands.

I have never been in IT myself, I have always been a developer/architect/CTO for software vendors. Mr Roeltgen clearly has a special disdain for software vendors, and I don’t blame him. I was fascinated and entertained to read about struggles with software and software vendors from an IT perspective.

At one point he talks about ‘cheatware’ – his term for demos by software vendors that include functionality that is fake or hard-wired for the purposes of the demo. The software vendors call this ‘demoware’, and it is still routinely used by proprietary software vendors.

The author likens the combined IT infrastructure, systems, and applications to an ecosystem or biotope. He stresses the point that no two companies have the exact same environment, and that most of the cost and complexity results from this situation.

Overall, it is an excellent book.

Written by James

May 13, 2010 at 5:18 pm

Web Posting Response Assessment

leave a comment »

Not sure whether, or how, to respond to an article, blog, or comment you see online? Let this decision tree help you.

This web-based decision tree is adapted from the ‘U.S. Air Force Web Posting Response Assessment’ that was created by the Air Force Public Affairs Agency and put into the Public Domain. Click here to view the original (the chart is the last page of the PDF).

Click Here To Start

Written by James

May 11, 2010 at 2:26 am

Posted in Uncategorized

Evaluate: Balance

leave a comment »

Is the post positive and/or balanced?

YES NO

Click here to go to the start of the decision tree

Written by James

May 11, 2010 at 2:11 am

Posted in Uncategorized