James Dixon’s Blog

James Dixon’s thoughts on commercial open source and open source business intelligence

Archive for the ‘Business Intelligence’ Category

12 Days of Visualizations – Sun Burst

leave a comment »

Today we are launching our 12 Days of Visualizations program: http://events.pentaho.com/12days-of-Big-Data-Visualizations.html

We are going to release a few new visualizations every week over the holiday period. You can drop these visualizations into a Pentaho BA server and they will appear on the charting menu in Analyzer.

The first one that we are releasing is a Sun Burst chart. This chart is based on the Protovis sun burst chart - http://mbostock.github.com/protovis/ex/sunburst.html

The Sun Burst chart we created can be used in a couple of ways. Firstly it can be used as a multi-level pie chart. This sun burst shows how the sales in three territories breaks down into sales of product lines within those territories, and then how product line sales compare by year:

Screen Shot 2012-12-13 at 3.41.40 PM

This effect is achieved by using a color gradient for the outer ring that is based on the chart palette color of the inner rings, and by sorting the segments in each ring into descending order. When you compare the sun burst above with the pie chart below, you can see how much more information the sun burst provides.

pie

You can choose to use a common color gradient on the outer ring so that it is easier to compare the items on that ring. In this example a blue gradient has been used for the outer ring. Regardless of which territory a city is in, the shade of blue it is colored in can be used to compare it with other cities.

Screen Shot 2012-12-13 at 3.43.10 PM

In this chart a red/yellow/green gradient has been used. Here the levels of the chart are year, quarter, and month so the data has not been sorted. The data for this chart is overtime costs so the gradient has been reversed to show larger overtime costs in red, and smaller ones in green.

Screen Shot 2012-12-13 at 3.46.11 PM

You can find out more about this chart here: http://wiki.pentaho.com/display/COM/Sunburst

Written by James

December 14, 2012 at 5:39 pm

5 Out Of 6 Developers Are Using Open Source

leave a comment »

ZDNet  reports on a Forrester survey that finds 5 out of 6 developers are using or deploying open source.

http://www.zdnet.com/five-out-of-six-developers-now-using-or-deploying-open-source-7000008499/?s_cid=rSINGLE

In the survey they found that 7% of developers are using open source software tools such as Pentaho.

The United States Department of Labor state that, in 2010, there were 913,100 software developers in the USA alone.

http://www.bls.gov/ooh/Computer-and-Information-Technology/Software-developers.htm

7% of 913,100 means about 64,000 developers using open source business intelligence software. Nice.

Written by James

December 10, 2012 at 2:25 pm

I Never Owned Any Software To Begin With

leave a comment »

My thoughts on the whole Emily White/stealing music topic:

http://www.npr.org/blogs/allsongs/2012/06/16/154863819/i-never-owned-any-music-to-begin-with

http://thetrichordist.wordpress.com/2012/06/18/letter-to-emily-white-at-npr-all-songs-considered/

When she says she only bought 15 albums, I think she is talking about physical CDs. I think she did buy some of her music online. But she clearly states that she ripped music from the radio station and swapped mix CDs with her friends, and she makes it sound like she thinks this is not stealing.

Don’t Blame iTunes

Many people who complain about artist’s income people blame Apple and iTunes. Yes, iTunes propagated the old economic splits and percentages into the digital world. But Apple did not create those splits, they were agreed upon in contracts between the labels/producers and the artists. What iTunes did was to provide an alternative digital distribution medium to Napster. Apple saved artists from the prospect of getting no revenue at all. People who attack and boycott iTunes thinking that they are helping artists are deluded.

It’s Not Just Music

This whole debate also extends to movies, books, news commentary, and software – anything that can be digitally copied. In each of these arenas, the players and economic distribution is different, but the consequence of not paying is the same. If we all behaved this way, ultimately, there would be no books, or movies either. So how does this relate to proprietary software, open source software, and free software?

Proprietary Software

Just like companies that publish books, music, movies etc, proprietary software companies were the gatekeepers. They decided what software was created and made available. When the hardware and software becomes available at the consumer level, independent producers spring up. This happened with freeware software for PCs. The internet enables the distribution of the software, and methods of collecting payment. The costs of creating books, music, and movies have dropped dramatically because of the hardware and software now available. But, if no-one pays for the content created the proprietary software companies will go out of business.

Non-Proprietary

Open source and free software are other ways for creating and distributing software, the difference being that these rely on software (source and binaries) being easy to copy. Don’t steal Microsoft’s BI software and use it without permission. Use our open source BI software – we want you to.

Free Software

Free software requires that the software, and all software that is built upon it, be ‘free’. In this case ‘free’ means you can freely modify it, distribute it, and build upon it, and you give others those same rights. You can still charge for the software, but it makes no sense to (given the rights you give to your ‘customer’).

The ideals of Free Software Foundation (FSF) are based on the notion that when you think of something or invent something, it belongs to the world, you don’t own it. This is a wonderful idea, however most of the world, including many industries,and jobs, and professions, are based on the opposite principle – if you create it, you own it. To my mind I have fewer rights under the FSF view of the world, I don’t have the right to my own ideas.

Because of the freedoms that the Free Software Foundation believe in, they are against Digital Rights Management (DRM) software. DRM tries to protect the rights of artists, producers, and distributors of artistic content. In order to protect these rights, software is needed that is proprietary. If the DRM source code was open, it would make it easy for hackers to decode the content and remove the copy protection. So the Free Software Foundation is taking up the fight against DRM, calling it ‘Digital Restrictions Management’ (http://www.fsf.org/campaigns/drm.html). They call it this because, they say, DRM takes away your right to steal other people’s inventions. If you support of DRM-free software, you are choosing to fight against musicians, authors, actors.

Open Source

The Open Source movement takes a pragmatic approach on this topic. When you have an idea, it is yours. You can choose to do whatever you want with your invention. If it is a software invention, and you choose to put it into open source, that’s great. If you choose not too, that’s fine too, because it is yours. Open Source allows hybrid models – where a producer can decide to put some of their software into open source but not all of it (open core or freemium model). This model enables a software producer to provide something of value to people who would not have paid for anything anyway (this includes geographies and economies where the producer would not sell anyway). These people are willing participants and contributors in other ways. The producer also gets to sell whatever software products it wants.

Doomsday?

For some creative areas, if no-one pays for any content anymore, the creators will disappear eventually, and there will be no more content. But what happens if no-one pays for software anymore?

Proprietary software dies eventually, unless they switch to services models.

The majority of people contributing to open source/free software today are IT developers. There are two main types here: creating/extending/fixing software in the course of getting their project finished, or sponsored contributors. IT is where the majority of software developers are today, so IT/enterprise/business software is safe.

The software that would be at most risk would be software that is created by smaller software companies. Particularly software that has large up-front development costs. Games. The first, and maybe only, software segment to die would be the big-budget, realistic, immersive, loud video games. Who cares most about these games? The same demographic that is stealing all the music.

I say let Generation OMG copy and steal everything they want. All the really cool and fun careers will evaporate. Lots of the stuff they love (movies, music, games) will disappear. After they have spent a decade texting each other about how sucky everything is, they will grow up and have to re-create these industries. Hopefully with better economic structures than the current ones.

leave a comment »

I’m at the MongoNYC conference in New York today, where Pentaho is a sponsor. 10gen have done a great job with this event, and they have 1,000 attendees at the event.

We just announced a strategic partnership between 10gen and Pentaho. From a technical perspective the integration between MongoDB and Pentaho means:

  • No Big Silos. Data silos are bad. Big ones are no better. Our MongoDB ETL connectors for reading and writing data mean you can integrate your MongoDB data store with the rest of your data architecture (relational databases, hosted applications, custom applications, etc).
  • Live reporting. We can provide desktop and web-based reports directly on MongoDB data
  • Staging. We can provide trending and historical analysis by staging snapshots of MongoDB aggregations in a column store.

I’m looking forward to working with 10gen to integrate some of their new aggregation capabilities into Pentaho.

Written by James

May 23, 2012 at 2:22 pm

Pentaho and DataStax

with one comment

We announced a strategic partnership with DataStax today: http://www.pentaho.com/press-room/releases/datastax-and-pentaho-jointly-deliver-complete-analytics-solution-for-apache-cassandra/

DataStax provides products and services for the popular Apache No-SQL database Cassandra. We are releasing our first round of Cassandra integration in our next major release and you can download it today (see below).

Our Cassandra integration includes open source data integration steps to read from, and write to Cassandra. So you can integrate Cassandra into your data architecture using Pentaho Data Integration/Kettle and avoid creating a Big Silo – all with a nice drag/drop graphical UI. Since our tools are integrated, you can  create desktop and web-based reports directly on top of Cassandra. You can also use our tools to extract and aggregate data into a datamart for interactive exploration and analysis. We are demoing these capabilities at the Strata conference in Santa Clara this week.

Links

Written by James

February 28, 2012 at 4:07 pm

Pentaho’s Big Data Release

leave a comment »

This week at Pentaho we announced a major Big Data release, including:

  • Open sourcing of our of big data code
  • Moving Pentaho Data Integration to the Apache license
  • Support for Hbase, Cassandra, MongoDB, Hadapt
  • And numerous functionality and performance improvements

What does this mean for the Big Data market, for Pentaho, and for everyone else?

We believe you should use the best tool for each job. For example you should use Hadoop or a NoSQL database where those technologies suit your purposes, and use a high performance columnar database for the use cases they are suited to. Your organization probably has applications that use traditional databases, and likely has a hosted application or two as well. Like it or not, if you have a single employee that has a spreadsheet on their laptop, you have a data architecture that includes flat files. So every data architecture is a hybrid environment to some extent. To solve the requirements of your business, your IT group probably has to move/merge/transform data between these data stores. You may have an application or two that has no external inputs or outputs, and no integration points with other applications. There is a word for these applications – silos. Silos are bad. Big data is no different. A big data store that is not integrated with your data architecture is a Big Silo. Big Silos are just as bad as regular silos, only bigger.

So when you add a big data technology to your organization, you don’t want it to be a silo. The big data capabilities of Pentaho Data Integration enable you to integrate your big data store into the rest of your data architecture. If you are using any of the big data technologies we support you can move data into, and out of these data stores using a graphical environment. Our data integration capabilities also extend to traditional databases, columnar databases, flat files, web services, hosted applications and more. So you can easily integrate your big data application into the rest  of your data architecture. This means your big data store is not a silo.

For Pentaho, the big data arena is a strategic one. These are new technologies and architectures so all the players in this space are starting from the same place. It is a great space for us because people using these technologies need tools and capabilities that are easy for us to deliver. Hadoop is especially cool because all of our tools and technologies are pure Java and are embeddable, so we can execute our engines within the data nodes and scale linearly as your data grows.

For everyone else our tools continue to provide great bang for the buck for ETL, reporting, OLAP, predictive analytics etc. Now we also lower the cost, time, and skills sets required to investigate big data solutions. For any one application you can divide the data architecture into two main segments: client data and server data. Client data includes things like flat files, mobile app data, cookie data etc. Server data includes transactional/traditional databases and big data stores. I don’t see the server-side as all or nothing. It could be all RDBMS, all big data store, 50/50, or any mix of the two. It’s like milk and coffee. You can have a glass of milk, a cup of coffee, or variations in between with different amounts of milk or coffee. So you can consider an application that only uses a traditional database today to be an application that currently utilizes 0% of its potential big data component. So every data architecture exists on this continuum, and we have great tools to help you if you want to step into the big data world.

If you want to find out more:

 

 

Written by James

February 2, 2012 at 9:56 pm

olap4j V1.0 has been released.

leave a comment »

Back in the ’90s and early 2000′s I was involved in the attempts by the proprietary BI vendors to create common standards. Anyone remember JOLAP? The vendors were doing this only because of increasing demand and frustration from their customers – none of them actually these standards. Why? Because, in the short term, standards would only help the customers and the implementers, not the vendors. These efforts were hugely political with many of the vendors taking the opportunity to score points against each other. The resulting ‘standards’ were useless, and none of the large vendors were willing, or able, to support them.

How refreshing, then, to have olap4j reach the 1.0 milestone – http://www.olap4j.org. Created by consumers and producers of open source BI software, olap4j shows the advantage of open collaboration by motivated parties. Already olap4j has a Mondrian driver, and an XMLA driver for Microsoft SQL Server Analysis Services, SAP BW, and Jedox Palo. There are also several clients who use olap4j servers, some from Pentaho, and Saiku, Wabit, and ADANS.

olap4j is very cool stuff. You can read more on Julian Hyde’s blog. Congratulations for everyone that has worked on olap4j.

Written by James

April 12, 2011 at 12:24 pm

More Hadoop in New York City

leave a comment »

Yesterday was fun. First I met with a potential customer looking to try Hadoop for a big data project.

Then I had a lengthy and interesting chat with Dan Woods. Amongst other things Dan runs http://www.citoresearch.com/ and also blogs for Forbes. We talked about Pentaho’s history and our experiences so far with the commercial open source model.   We also talked about Hadoop and big data and about the vision and roadmap of our Agile BI offering.

Next I met with Steve Lohr who is a technology reporter for the New York Times. We talked about many topics including the enterprise software markets and how open source is affecting them. We also talked about Hadoop, of course.

Next was a co-meet-up of the New York Predictive Analytics and No-SQL groups where I presented decks about Weka and Hadoop, separately and together. There were lots of interesting questions and side discussions earlier. By the time we finished all these topics a blizzard was going on out side. Cabs were nowhere to be seen so Matt Gershoff of Conductrics was kind enough to lead me via the subway to the vicinity of my hotel.

Written by James

January 27, 2011 at 4:01 pm

Big Data in New York City

leave a comment »

I’m having an interesting time in NYC this week. I had to retrieve my snowboarding jacket out of the attic for this trip. It’s snowing right now, which is better than the sleet forecast for later. So far I’ve met with a few Big Data customers and prospects and presented at the New York Hadoop User Group. Our hybrid database/Hadoop data lake architecture always gets a good reception and our ability to run our data integration engine within the Hadoop data nodes impresses people.

Being the first Business Intelligence vendor to bring reporting and ETL to the Hadoop space sets us apart from all the other vendors. We have so much recognition in this space that I’ve spoken to a few people in the last month who thought we were ‘THE’ visualization and data transformation provider for Hadoop and didn’t connect to other data sources.

This afternoon I’m meeting with reporters and columnists from a couple of different publications to chat about Big Data / Hadoop stuff. Tonight I’m presenting at the New York Predictive Analytics Meetup to talk about Hadoop from an analysis perspective.

 

 

Written by James

January 26, 2011 at 5:20 pm

Meetups and Pentaho Summit(s) coming up in January

with one comment

It’s going to be a busy month.

January 19th and 20th is our Global 2011 Summit in San Francisco. I have three sessions Pentaho for Hadoop, Extending Pentaho’s Capabilities, and an Architecture Overview. So I’m creating and digging up some new sample plug-ins and extensions. I’m also going to take part in an Q&A session with the Penaho architects since Julian Hyde (Mondrian), Matt Casters (PDI/Kettle), Thomas Morgner (Pentaho Reporting) will all be there. Who should attend?

CTOs, architects, product managers, business executives and partner-facing staff from System Integrators and Resellers, as well as Software Providers with a need to embed business intelligence or data integration software into your products.

We usually have customers and prospects attending our summits as well.

We are also having an architect’s summit that same week to work on our 2011 technology road-map. That should be a lot of fun.

The week after that I’ll be in New York presenting at the NYC Hadoop User Group on Tuesday, January 25 and the NYC Predictive Analytics Meetup on Wednesday January 26th.

Written by James

January 5, 2011 at 9:20 pm

Follow

Get every new post delivered to your Inbox.

Join 634 other followers