James Dixon’s Blog

James Dixon’s thoughts on commercial open source and open source business intelligence

Open Source: In praise of the profiteering enterprise, the greedy freeloader, and the selfish developer

leave a comment »

Recently, Matt Asay talked about a number of different issues causing conflict in the free and open source world in a piece titled “The new struggles facing open source” and comes to the conclusion that currently the biggest problem is the role of enterprises controlling projects (he says “controlling the community” which is impossible, if you’ve every tried it). He makes a lot of sense but I don’t entirely agree with his points about the role of businesses in open source being detrimental. Hadoop, Spark, Storm, Kafka, Hive, HBase etc all came from enterprises that still employ the majority of the core contributors in most cases. Why did these companies create these technologies? Not for philanthropy. Not for the greater good. For better profit via better infrastructure. Having created those technologies they decided to open source them. For the greater good? No. For lower maintenance, and better profits, with side benefits of better mindshare and easier recruiting. Did these companies open source their domain-specific intellectual property that is the basis of their business? No, and they never will. They only open sourced internally developed infrastructure that is tangential to their business. Do these companies believe that all ideas inherently belong to the people of the world? No. They put into open source what was in their best interest to do so. Self-interest all round. Score 1 point for greed.

In another piece titled “Enterprises still miss the real point of open source” Matt argues that enterprises, while they are using a lot of open source, still don’t get it. He finishes with:

Again, merely using open source isn’t enough. Contributions are required.

But let’s look at “The rise and rise of open source” by Simon Phipps. This is a review of Black Duck’s most recent “The Future Of Open Source” survey. The net result is that across all the important metrics usage of open source for running businesses and creating products is now over 50% for the first time. Some of the merits are still rising rapidly. 78% of respondents report they are running their business with open source software. Indicating that an approach based on, and using, open source is now the mainstream, and that purely proprietary approaches are now the minority. As a result InfoWorld is stopping their open source special interest channel, because it is now the mainstream. Yay for open source.

But who are these companies that make up these statistics, that represent the majority of businesses. Are they all contributing to open source? The survey indicates that while 78% of businesses are running on open source, only 64% of those say they are contributing to open source. What do we call the greedy who use open source but do not contribute to it? They are the Freeloaders. Matt Asay says they need to contribute. I say they already have. If the freeloaders weren’t using open source, only 49.92% (78% * 64%) of companies would be running their business on open source. In other words the only reason we can claim today that open source is the mainstream is by the actions of the (apparently) non-contributing the freeloaders. But isn’t tipping the balance of the overall market from proprietary to open source a contribution in itself? Of course it is. The act of merely using open source software displaces a proprietary alternative, and is a contribution in its own right. No matter how little you contribute, even the greedy who contribute nothing, still make a contribution. Score another point for greed.

So now lets look at the people to do contribute. The vast majority of these are paid contributors employed by enterprises and IT/software developers trying to get their job done and to rollout a product or feature. These activities include creating features, fixing bugs, translating, testing etc (the list is long). Enterprises fund these activities for several reasons including:

  • Getting a product to market
  • Lowering development costs
  • Lowering license fees
  • Improving time to market
  • Employee retention
  • Increasing mindshare and thought leadership

Philanthropy? Nope.

Developers fix bugs and contribute them to a project because they don’t want to re-apply the bug in every future release. This is self-serving behavior. Do I care who controls or directs the project? Nope. The core contributors of the project accept the bug to increase quality, which helps adoption, which grows the project. This is self-serving behavior. One of the greatest and most powerful things about open source is that everyone can act out of self-interest, and everyone gains from everyone else acting selfishly. This makes the model very strong. Score another point for greed.

Final score: Philanthropy 0, Greed 3

If open source is ultimately driven by greed and self-interest, how is it any better than proprietary software development? Because it is an inherently better way to develop software, and in so many ways that the fight isn’t even close. Is it philosophically better? Yes, I believe the fundamental principles of open source are better than proprietary development. But is it morally better? No. The underlying power of open source over proprietary development is that greed is naturally converted into useful contribution, whereas with proprietary development greed translates into channel conflict, price fixing, monopolies, class action suits, vendor lock-in, and inefficient, low-quality, bloated software.

Open source rules the day. But philanthropy and the believe that all ideas belong to the world did not get it there.

Written by James

May 8, 2015 at 8:14 pm

Posted in Uncategorized

Response to “Will There Ever Be Another Red Hat”

leave a comment »

Response to Dan Woods’ Forbes article “Will There Ever Be Another Red Hat”

This is a very nice piece. Although I wouldn’t say “never”. When IBM was riding high it was difficult to foresee the rise of Microsoft, and when Microsoft was riding high Apple was nowhere.

In the piece Dan refers to two of my blog posts about the open source business models. Dan infers that I was saying that open source companies do not need sales and marketing budgets. This is not exactly what I was trying to say. I was not saying that open source companies don’t need sales and marketing budgets. They do. The difference is that you can do a primarily or exclusively inbound sales model. This is much cheaper than an outbound model. You definitely need a marketing budget to create the inbound leads. Having an active community helps generate leads and lowers sales and marketing costs even more.

My main point was to refute the idea that the subscription model is inferior because the lack of an initial license fee hurts – in the proprietary model that license fee pays for the cost of acquiring customers and nothing more. As an example of the proprietary model Qlik is losing $25m-$30m per quarter on $125m revenue because their S&M budget is $75m. They are losing money to gain customers, with the hope that they can make enough services/support/up-sell dollars to make a profit eventually.

There are fundamental differences between the proprietary license fee model and the open source subscription model, and that unless you understand both of them well, you are not in a good position to compare them or criticize either of them (as an analyst was doing at the time).

Open source subscription models can be very successful, but their economics seems to be poorly understood by some. Amazon made $5bn last year renting out servers in the cloud. Google make billions by selling ad for 6c. In-app purchases generate billions 99c at a time. These are new and kinda weird ways to make money. Mobile, Cloud, Big Data, and IoT are changing things quickly and everyone (including analysts) will have to pay attention if they want to keep up.

Written by James

April 28, 2015 at 7:19 pm

Posted in Uncategorized

Open-Plan Offices: Silicon Valley is right, your boss is wrong.

leave a comment »

My response to this article: http://www.theage.com.au/comment/silicon-valley-got-it-wrong-the-openplan-office-trend-is-destroying-the-workplace-20150420-1molwh.html

To summarize this person’s critique of open-plan offices:

  • My boss took away cubicles (with no open line of sight) and lined us up against a wall (with no open line of sight).
  • My boss took away cubicles (with little interactivity) and lined us up against a wall (with little interactivity).
  • My boss took away cubicles (with understood rules of interaction) and lined us up against a wall (with no understood rules of interaction, and no guidance).
  • My boss took away cubicles (which encourage personal productivity) and tried to create an open-plan environment (which encourages team productivity), but failed badly.
  • All of this is the fault of open-plan offices (and not my boss).

In my job I could work from home every day, but I don’t, because team productivity is more important than any one person’s productivity. If you interview people about their personal productivity, they rarely think about the big picture, only their personal stuff. I could also have an office if I wanted, but I don’t. Again, open-plan is about team productivity (see every thing ever written about Agile). I do, however, work from home occasionally to give the others a break from my glorious wit.

A productive team creates self-governing rules. In our bullpen, if you leave your phone at your desk and it rings, when you return to your desk your phone will be in a sound-proof box. Sometimes the box will be hidden. Headphones are fine, but audible music is no-no (there are more than sufficient Nerf guns to stop that obtrusive behavior quickly). If a person in your environment is being inconsiderate, it’s not the fault of the environment, it’s the fault of the person. Blaming the environment will not solve the problem.

In a team environment, frictional conversations (the ones that just happen when people are close to each other) are very valuable. In the early 2000’s we tried a purely remote environment, and the results were not great, so when we started Pentaho, we went with a “co-locate and open-plan” approach when possible. To the extent that for the last ten years all of Pentaho’s founders (including CEO, CTO, and Chief Engineer) have never had a closed office.

Open-plan environments are not right for all teams. Any group that regularly needs phone conversations, such as Sales and Support, are not good candidates for open-plan environments. But again, it is not the fault of the environment. If an environment is being implemented wrongly or inappropriately, it is not the fault of the environment, it’s the fault of the implementor.

I’m sorry that the author of this article had a negative experience with an open plan office. But it’s not true that Silicon Valley got it wrong. Your boss got it wrong.

Written by James

April 23, 2015 at 1:18 am

Posted in Uncategorized

Are individuals born with characteristics that dispose them to enterpreneurship?

leave a comment »

This is my answer to this question on Quora: http://www.quora.com/Are-individuals-born-with-characteristics-that-dispose-them-to-enterpreneurship

I think this is true to a certain extent. I also think that, in many cases, those characteristics align with the traits that come with ADD/ADHD.

Google “entrepreneurs ADHD” and take a look at the results or read this Forbes article for more details: ADHD: The Entrepreneur’s Superpower

Some famous entrepreneurs who have acknowledged their ADHD (source: Famous People with ADHD – Adult ADD Center of Maryland)

* Richard Branson (founder of Virgin Records, Virgin Atlantic, Virgin Galactic etc)
* David Needleman (founder of JetBlue Airlines)
* Alan Meckler (founder of several magazines and companies)
* Paul Orfalea (founder of Kinkcos)
* Charles Schwab (found of Charles Schwab)
* Walt Disney (founder of Walt Disney)

Historical Figures who showed characteristics of ADHD (source: Famous People With ADHD Traits)

* Abraham Lincoln
* Robert F. Kennedy
* John F. Kennedy, who (allegedly) smoked pot to help him focus.
* Benjamin Franklin
* Henry Ford
* Thomas Edison
* Leonardo da Vinci
* Alexander Graham Bell
* Orville and Wilber Wright
* Sir Isaac Newton
* Albert Einstein

As my friend Marten Mickos says, follow your passion and believe in willpower.

But if you have (manageable) ADHD, it might help.

Written by James

April 16, 2015 at 7:19 pm

Posted in Uncategorized

My thoughts on “Why Women Shouldn’t Code”

leave a comment »

Here is the original article: https://medium.com/@hardaway/why-women-shouldnt-code-82205165e64a

Firstly, the title is attention grabbing nonsense. The article is about why women (girls) should not be forced to learn to code at school.

There are a some good points in the article. In general, today, women are less interested in software careers, and software degrees than men are. That’s a fact.

Companies like to promote from within, but many (male) software engineers make lousy managers and directors. Myself included. So finding good software engineering managers is hard. That’s a fact. Imagine Sheldon Cooper managing a team of 10 Sheldon Coopers. What a nightmare. The best managers I ever had as a software engineer were all women. The others were all men. So I would like to see college and vocational courses just for Engineering Management. That might be a role that attracts more women than men. If so, great.

I don’t agree that women shouldn’t code. I don’t agree that girls should not be required to learn coding at school – and here is why. At school we learn Art, and Music, and History, and Sports. How many of us become professional artists, musicians, historians, or athletes? Almost none of us, but that’s not why we learn them.Learning the about those subject enriches us and gives us context. So much of our world today is driven by and dependent on computers, that a basic understanding of how they are controlled seems, to me, to be more important than knowing who won the Battle of Antietam and when.

If teaching girls to code results in more female software engineers, that’s great, if it doesn’t, that’s ok too, because they know more about the world than they did before.

Written by James

April 15, 2015 at 11:13 pm

Posted in Uncategorized

Microsoft’s Open Sourcing of ASP.NET

leave a comment »

Under an Apache License on GitHub.

http://www.asp.net/open-source

Seems that RedHat’s Truth Happens video http://www.redhat.com/v/mov/TruthHappens.mov needs another line:

First they ignore you…

Then they laugh at you…

Then they fight you…

Then you win.

And then they realize you had the right idea all along and adopt your practices for themselves.

Written by James

April 12, 2015 at 7:12 pm

Posted in Uncategorized

Pile-On: Dan Woods “Lessons From The First Wave Of Hadoop Adoption”

leave a comment »

Dan Woods put out a nice piece yesterday on his Forbes blog titled “Lessons From The First Wave Of Hadoop Adoption“.

I agree with him that the insights and advantages of Big Data solutions need to be described in ways other than technology. I’m going to add on to his insights.

1. It’s about more than big data. It’s a new platform.

Yes, it is a new platform.  That means it’s different than the old ones. The fact that you can do some things cheaper than you could before is not the main idea. A bigger story is that some things that were economically not possible before, now are. But the main idea is that this is a new platform, with new capabilities, that needs to fit into your existing data architecture.

2. Don’t get rid of your data warehouse

I completely agree. Big Data technology is a new tool with new characteristics. Using it to replace a Data Warehouse technology that is finely tuned for that use case is not a great idea. Don’t listen to the “Hadoop will replace every database within x years” crowd. No database has managed to replace every database. No database ever will because the variety of the use cases is too large.

3. Think about your data supply chain

Since a Big Data system needs to fit in with everything you currently have and operate, integration is a significant priority. Understand that with Big Data you can build a Big Silo, but a Big Silo is as bad as a small silo (just a lot bigger). You should not be required to pump all your data from every system into Hadoop to get value from it. Design you data architecture carefully, the implications and fallout of getting it right or wrong are significant.

4. It’s complicated

Yes it is. It’s also not cheap to do it well. Sure you can download a lot of open source software and prototype or prove your ideas without a lot of upfront outlay. But putting it into production is a production. Expect that.

Written by James

January 27, 2015 at 5:01 pm

Union of the State – A Data Lake Use Case

with 6 comments

Many business applications are essentially workflow applications or state machines. This includes CRM systems, ERP systems, asset tracking, case tracking, call center, and some financial systems. The real-world entities (employees, customers, devices, accounts, orders etc.) represented in these systems are stored as a collection of attributes that define their current state. Examples of these attributes include someone’s current address or number of dependents, an account’s current balance, who is in possession of laptop X, which documents for a loan approval have been provided, and the date of Fluffy’s last Feline Distemper vaccination.
State machines are very good at answering questions about the state of things. They are, after all, machines that handle state. But what about reporting on trends and changes over the short and long term? How do we do this? The answer for this is to track changes to the attributes in change logs. These change logs are database tables or text files that list the changes made over time. That way you can (although the data transformation is ugly) rewind the change log of a specific field across all objects in the system and then aggregate those changes to get a view over time. This is not easy to do and assumes that you have a change log. Typically, change logs only exist for the main fields in an application. There might only be change logs on 10-20% of the fields. So if you suddenly have an impulse so see how a lesser attribute has changed over time you are out of luck. It is impossible because that information is lost.
This situation is similar to the way that old school business intelligence and analytic applications were built. End users listed out the questions they want to ask of the data, the attributes necessary to answer those questions were skimmed from the data stream, and bulk loaded into a data mart. This method works fine until you have a new question to ask. The Data Lake approach solves this problem. You store all of the data in a Data Lake, populate data marts and your data warehouse to satisfy traditional needs, and enable ad-hoc query and reporting on the raw data in the Data Lake for new questions.
A Data Lake can also be used to solve the problems of history and trending for workflow applications and state machines. What if these applications write their initial state into the Data Lake and then also write the change of every attribute in there as well? While we are at it, let’s log all the application events coming from the user interface tier as well. From the application’s perspective this is a low-latency fire and forget scenario.
Now we have the initial state of the application’s data and the changes to of all of the attributes, not just the main/traditional fields. We can apply this approach to more than one application, each with its own Data Lake of state logs, storing every incremental change and event. So now we have the state of every field of (potentially) every business application in an enterprise across time. We have the “Union of the State”.
With this data we have the ability to rewind the Union of the State to any point in time. What are the potential use cases for the Union of the State?
Enterprise Time Machine
Suppose something happened a few weeks ago. Decisions were made. Things changed. But exactly what, when, and why? With an Enterprise Time Machine you can rewind the complete state of every major application to any point in time and then step forward event by event, click by click, change by change, at the millisecond level if things happened that quickly. For an e-commerce vendor this means being able to know for any specified millisecond in the past how many shopping carts where open, what was in them, which transactions were pending, which items were being boxed, or in transit, what was being returned, who was working, how many customer support calls were queued and how many were in progress. In different domains such as financial services or healthcare, the applications and attributes are different but the ability is the same.
In order to reconstruct the state at any point in time we need to load the initial snapshot into a repository and then update the attributes of each object as we process the logs, event by event, until we get to the point in time that we are interested in. A NoSQL store such as MongoDB , HBase, or Cassandra should work well as the repository. This process could be optimized by adding regular snapshots of the whole state into the Data Lake so that we don’t have to process from the very beginning every time. For a detailed analysis you could rebuild the state to a particular point in time and then process forwards in increments of any size. This way the situation of a device failure that led to a catastrophic cascade of events can be re-created and examined millisecond by millisecond.
Trending
Since we can re-create the state at any point in time we can do trending and historical analysis of any and every attribute over any time period, at any time granularity we want.
Compliance
When user interface events are logged as well as the attribute changes you have the ability to know not only who changed what information, but also who looked at it. Who was aware of the situation? Why did Bob open a particular record every few hours and cancel out without making changes? This requires the History Machine described above.
Predictive
One of the main tasks in a predictive exercise is to work out which attributes are predictive of your target variable and which ones are not. This can be impossible to do when you only have 10% of your attributes logged. Maybe the minor attributes are the predictive ones. Now you have all of them. This requires the trending facility described above.
Doug Moran, a co-founder of Pentaho and product manager for its Big Data products, sees many predictive applications for this kind of data. This includes the ability to derive a model from replays of previous events and use it to prescribe ways to influence the current situation to increase the likelihood of a desired outcome. For example, this could include replaying all previous shopping cart events for a user currently on an e-commerce site to derive a predictive model that prescribes a way to influence their current purchase in a positive way.
“Dixon’s Union of the State idea gives the Data Lake idea a positive mission besides storing more data for less money,”
said Dan Woods, an IT Consultant to buyers and vendors and CEO of Evolved Media, who has written about the Data Lake for several years.
“Providing the equivalent of a rewind, pause, forward remote control on the state of your business makes it affordable to answer many questions that are currently too expensive to tackle. Remember, you don’t have to implement this vision for all data for it to provide a new platform to answer difficult questions with minimal effort.”
Architecture
How could this be done?
  • Let the application store it’s current state in a relational or No-SQL repository. Don’t affect the operation of the operational system.
  • Log all events and state changes that occur within the application. This is the tricky part unless it is an in-house application. It would be best if these events and state changes were logged in real time, but this is sometimes not ideal. Maybe SalesForces or SugarCRM will offer this level of logging as a feature. Dump this data into a Data Lake using a suitable storage and processing technology such as Hadoop.
  • Provide the ability to rewind the state of any and all attributes by parallel processing of the logs.
  • Provide the facilities listed above using technologies appropriate of each use case (using the rewind capability).

The plumbing and architecture for this is not simple and Dan Woods points out that there are databases like Datomic that provide capabilities for storing and querying state over time. But a solution based on a Data Lake has the same price, scalability, and architectural attributes as other big data systems.

Written by James

January 22, 2015 at 4:43 am

AWS and Your Crown Jewels

leave a comment »

These are my thoughts on a recent Dan Woods (Forbes) post titled “Will Companies Ever Move Their Crown Jewels to Amazon Web Services?“.

My short answer is yes, because otherwise Jeff Bezos (the Founder of Amazon) has failed. As of today Bezos is worth $28.8 bn and #16 on Forbes list of powerful people. I’m guessing he’s the kind of guy who doesn’t like to fail.

Jeff Bezos  explains his vision in this 10 year old TED talk: https://www.ted.com/talks/jeff_bezos_on_the_next_web_innovation

He spends the first 7 minutes comparing the Internet bubble to the California gold rush and then moves onto an analogy comparing the internet today with the electricity industry 100 years ago.

I admire the time and energy he spends on his analogies. He looked into different ones and compared them to find the best one. Good analogies are hard to find. The best ones sound obvious when you hear them but can hard to find. The Bee Keeker analogy for open source software took me months of iterations based on years of experience to come up with. It sounds fairly obvious when you hear it, but there was no analogy before it to help understand the model.

If an analogy is good enough will allow you to infer additional knowledge. If you follow Bezos’ electricity analogy, and look at history you can draw additional insights. Looking at the history of electricity adoption, we can draw inferences about the adoption of cloud computing (with some generalizations) :

  • Before the introduction of electricity supply as a commodity service, any large company needing electricity had its own electricity generators.
  • Before the introduction of cloud-based computing with utility pricing, any large company needing computing had its own data center.
  • Who were the first people to join the electricity grid? Small companies and residences without prior electrical supply.
  • Who were the first people to use cloud computing? Small companies and individuals without data centers.
  • Who were the last people to join the electricity grid? Large companies with their own power sources.
  • Who will be the last people to migrate to cloud computing? Large companies with their own data centers.

Looking in the press you can see that the majority of the anti-cloud talk comes from larger enterprises.

However, most companies that have started in the last five years are evolving cloud-based infrastructures. As a start-up you typically have desktop-based applications for accounting, HR, CRM etc. As you grow it makes sense to move to hosted solutions like Net Suite, SalesForce, SugarCRM etc. As you add more and more hosted solutions the cost and headache of installing and maintaining on-premise solutions looks less and less attractive.

So today’s generation of small comanies, which will become large companies in the future, have four classes of applications:

  • Domain-specific desktop applications
  • Generic applications with small scale usage (e.g. project planning)
  • Generic applications that will grow and become cloud based (payroll, CRM, or accounting in a small company)
  • Cloud-based applications

If companies, as Dan suggests, are using services other than Amazon for critical applications, then Amazon is failing in its mission due to operational issues. Jeff Bezos is not likely to let that continue for long.

Written by James

November 12, 2014 at 6:00 pm

Posted in Uncategorized

Extending Pentaho Analyzer and CDF

leave a comment »

Here is a sneak peek at some of the things I’ll be showing at Pentaho World sessions later this week (11am Oct 10th).

floorplan GPS trails and stationary indicators heatmap Screen Shot 2014-04-25 at 4.10.06 PM Screen Shot 2014-09-26 at 2.47.54 PM Screen Shot 2014-09-26 at 3.59.36 PM Screen Shot 2014-09-11 at 3.11.07 PM

Written by James

October 6, 2014 at 8:32 pm

Posted in Uncategorized