James Dixon’s Blog

James Dixon’s thoughts on commercial open source and open source business intelligence

Are individuals born with characteristics that dispose them to enterpreneurship?

leave a comment »

This is my answer to this question on Quora: http://www.quora.com/Are-individuals-born-with-characteristics-that-dispose-them-to-enterpreneurship

I think this is true to a certain extent. I also think that, in many cases, those characteristics align with the traits that come with ADD/ADHD.

Google “entrepreneurs ADHD” and take a look at the results or read this Forbes article for more details: ADHD: The Entrepreneur’s Superpower

Some famous entrepreneurs who have acknowledged their ADHD (source: Famous People with ADHD – Adult ADD Center of Maryland)

* Richard Branson (founder of Virgin Records, Virgin Atlantic, Virgin Galactic etc)
* David Needleman (founder of JetBlue Airlines)
* Alan Meckler (founder of several magazines and companies)
* Paul Orfalea (founder of Kinkcos)
* Charles Schwab (found of Charles Schwab)
* Walt Disney (founder of Walt Disney)

Historical Figures who showed characteristics of ADHD (source: Famous People With ADHD Traits)

* Abraham Lincoln
* Robert F. Kennedy
* John F. Kennedy, who (allegedly) smoked pot to help him focus.
* Benjamin Franklin
* Henry Ford
* Thomas Edison
* Leonardo da Vinci
* Alexander Graham Bell
* Orville and Wilber Wright
* Sir Isaac Newton
* Albert Einstein

As my friend Marten Mickos says, follow your passion and believe in willpower.

But if you have (manageable) ADHD, it might help.

Written by James

April 16, 2015 at 7:19 pm

Posted in Uncategorized

My thoughts on “Why Women Shouldn’t Code”

leave a comment »

Here is the original article: https://medium.com/@hardaway/why-women-shouldnt-code-82205165e64a

Firstly, the title is attention grabbing nonsense. The article is about why women (girls) should not be forced to learn to code at school.

There are a some good points in the article. In general, today, women are less interested in software careers, and software degrees than men are. That’s a fact.

Companies like to promote from within, but many (male) software engineers make lousy managers and directors. Myself included. So finding good software engineering managers is hard. That’s a fact. Imagine Sheldon Cooper managing a team of 10 Sheldon Coopers. What a nightmare. The best managers I ever had as a software engineer were all women. The others were all men. So I would like to see college and vocational courses just for Engineering Management. That might be a role that attracts more women than men. If so, great.

I don’t agree that women shouldn’t code. I don’t agree that girls should not be required to learn coding at school – and here is why. At school we learn Art, and Music, and History, and Sports. How many of us become professional artists, musicians, historians, or athletes? Almost none of us, but that’s not why we learn them.Learning the about those subject enriches us and gives us context. So much of our world today is driven by and dependent on computers, that a basic understanding of how they are controlled seems, to me, to be more important than knowing who won the Battle of Antietam and when.

If teaching girls to code results in more female software engineers, that’s great, if it doesn’t, that’s ok too, because they know more about the world than they did before.

Written by James

April 15, 2015 at 11:13 pm

Posted in Uncategorized

Microsoft’s Open Sourcing of ASP.NET

leave a comment »

Under an Apache License on GitHub.


Seems that RedHat’s Truth Happens video http://www.redhat.com/v/mov/TruthHappens.mov needs another line:

First they ignore you…

Then they laugh at you…

Then they fight you…

Then you win.

And then they realize you had the right idea all along and adopt your practices for themselves.

Written by James

April 12, 2015 at 7:12 pm

Posted in Uncategorized

Pile-On: Dan Woods “Lessons From The First Wave Of Hadoop Adoption”

leave a comment »

Dan Woods put out a nice piece yesterday on his Forbes blog titled “Lessons From The First Wave Of Hadoop Adoption“.

I agree with him that the insights and advantages of Big Data solutions need to be described in ways other than technology. I’m going to add on to his insights.

1. It’s about more than big data. It’s a new platform.

Yes, it is a new platform.  That means it’s different than the old ones. The fact that you can do some things cheaper than you could before is not the main idea. A bigger story is that some things that were economically not possible before, now are. But the main idea is that this is a new platform, with new capabilities, that needs to fit into your existing data architecture.

2. Don’t get rid of your data warehouse

I completely agree. Big Data technology is a new tool with new characteristics. Using it to replace a Data Warehouse technology that is finely tuned for that use case is not a great idea. Don’t listen to the “Hadoop will replace every database within x years” crowd. No database has managed to replace every database. No database ever will because the variety of the use cases is too large.

3. Think about your data supply chain

Since a Big Data system needs to fit in with everything you currently have and operate, integration is a significant priority. Understand that with Big Data you can build a Big Silo, but a Big Silo is as bad as a small silo (just a lot bigger). You should not be required to pump all your data from every system into Hadoop to get value from it. Design you data architecture carefully, the implications and fallout of getting it right or wrong are significant.

4. It’s complicated

Yes it is. It’s also not cheap to do it well. Sure you can download a lot of open source software and prototype or prove your ideas without a lot of upfront outlay. But putting it into production is a production. Expect that.

Written by James

January 27, 2015 at 5:01 pm

Union of the State – A Data Lake Use Case

with 6 comments

Many business applications are essentially workflow applications or state machines. This includes CRM systems, ERP systems, asset tracking, case tracking, call center, and some financial systems. The real-world entities (employees, customers, devices, accounts, orders etc.) represented in these systems are stored as a collection of attributes that define their current state. Examples of these attributes include someone’s current address or number of dependents, an account’s current balance, who is in possession of laptop X, which documents for a loan approval have been provided, and the date of Fluffy’s last Feline Distemper vaccination.
State machines are very good at answering questions about the state of things. They are, after all, machines that handle state. But what about reporting on trends and changes over the short and long term? How do we do this? The answer for this is to track changes to the attributes in change logs. These change logs are database tables or text files that list the changes made over time. That way you can (although the data transformation is ugly) rewind the change log of a specific field across all objects in the system and then aggregate those changes to get a view over time. This is not easy to do and assumes that you have a change log. Typically, change logs only exist for the main fields in an application. There might only be change logs on 10-20% of the fields. So if you suddenly have an impulse so see how a lesser attribute has changed over time you are out of luck. It is impossible because that information is lost.
This situation is similar to the way that old school business intelligence and analytic applications were built. End users listed out the questions they want to ask of the data, the attributes necessary to answer those questions were skimmed from the data stream, and bulk loaded into a data mart. This method works fine until you have a new question to ask. The Data Lake approach solves this problem. You store all of the data in a Data Lake, populate data marts and your data warehouse to satisfy traditional needs, and enable ad-hoc query and reporting on the raw data in the Data Lake for new questions.
A Data Lake can also be used to solve the problems of history and trending for workflow applications and state machines. What if these applications write their initial state into the Data Lake and then also write the change of every attribute in there as well? While we are at it, let’s log all the application events coming from the user interface tier as well. From the application’s perspective this is a low-latency fire and forget scenario.
Now we have the initial state of the application’s data and the changes to of all of the attributes, not just the main/traditional fields. We can apply this approach to more than one application, each with its own Data Lake of state logs, storing every incremental change and event. So now we have the state of every field of (potentially) every business application in an enterprise across time. We have the “Union of the State”.
With this data we have the ability to rewind the Union of the State to any point in time. What are the potential use cases for the Union of the State?
Enterprise Time Machine
Suppose something happened a few weeks ago. Decisions were made. Things changed. But exactly what, when, and why? With an Enterprise Time Machine you can rewind the complete state of every major application to any point in time and then step forward event by event, click by click, change by change, at the millisecond level if things happened that quickly. For an e-commerce vendor this means being able to know for any specified millisecond in the past how many shopping carts where open, what was in them, which transactions were pending, which items were being boxed, or in transit, what was being returned, who was working, how many customer support calls were queued and how many were in progress. In different domains such as financial services or healthcare, the applications and attributes are different but the ability is the same.
In order to reconstruct the state at any point in time we need to load the initial snapshot into a repository and then update the attributes of each object as we process the logs, event by event, until we get to the point in time that we are interested in. A NoSQL store such as MongoDB , HBase, or Cassandra should work well as the repository. This process could be optimized by adding regular snapshots of the whole state into the Data Lake so that we don’t have to process from the very beginning every time. For a detailed analysis you could rebuild the state to a particular point in time and then process forwards in increments of any size. This way the situation of a device failure that led to a catastrophic cascade of events can be re-created and examined millisecond by millisecond.
Since we can re-create the state at any point in time we can do trending and historical analysis of any and every attribute over any time period, at any time granularity we want.
When user interface events are logged as well as the attribute changes you have the ability to know not only who changed what information, but also who looked at it. Who was aware of the situation? Why did Bob open a particular record every few hours and cancel out without making changes? This requires the History Machine described above.
One of the main tasks in a predictive exercise is to work out which attributes are predictive of your target variable and which ones are not. This can be impossible to do when you only have 10% of your attributes logged. Maybe the minor attributes are the predictive ones. Now you have all of them. This requires the trending facility described above.
Doug Moran, a co-founder of Pentaho and product manager for its Big Data products, sees many predictive applications for this kind of data. This includes the ability to derive a model from replays of previous events and use it to prescribe ways to influence the current situation to increase the likelihood of a desired outcome. For example, this could include replaying all previous shopping cart events for a user currently on an e-commerce site to derive a predictive model that prescribes a way to influence their current purchase in a positive way.
“Dixon’s Union of the State idea gives the Data Lake idea a positive mission besides storing more data for less money,”
said Dan Woods, an IT Consultant to buyers and vendors and CEO of Evolved Media, who has written about the Data Lake for several years.
“Providing the equivalent of a rewind, pause, forward remote control on the state of your business makes it affordable to answer many questions that are currently too expensive to tackle. Remember, you don’t have to implement this vision for all data for it to provide a new platform to answer difficult questions with minimal effort.”
How could this be done?
  • Let the application store it’s current state in a relational or No-SQL repository. Don’t affect the operation of the operational system.
  • Log all events and state changes that occur within the application. This is the tricky part unless it is an in-house application. It would be best if these events and state changes were logged in real time, but this is sometimes not ideal. Maybe SalesForces or SugarCRM will offer this level of logging as a feature. Dump this data into a Data Lake using a suitable storage and processing technology such as Hadoop.
  • Provide the ability to rewind the state of any and all attributes by parallel processing of the logs.
  • Provide the facilities listed above using technologies appropriate of each use case (using the rewind capability).

The plumbing and architecture for this is not simple and Dan Woods points out that there are databases like Datomic that provide capabilities for storing and querying state over time. But a solution based on a Data Lake has the same price, scalability, and architectural attributes as other big data systems.

Written by James

January 22, 2015 at 4:43 am

AWS and Your Crown Jewels

leave a comment »

These are my thoughts on a recent Dan Woods (Forbes) post titled “Will Companies Ever Move Their Crown Jewels to Amazon Web Services?“.

My short answer is yes, because otherwise Jeff Bezos (the Founder of Amazon) has failed. As of today Bezos is worth $28.8 bn and #16 on Forbes list of powerful people. I’m guessing he’s the kind of guy who doesn’t like to fail.

Jeff Bezos  explains his vision in this 10 year old TED talk: https://www.ted.com/talks/jeff_bezos_on_the_next_web_innovation

He spends the first 7 minutes comparing the Internet bubble to the California gold rush and then moves onto an analogy comparing the internet today with the electricity industry 100 years ago.

I admire the time and energy he spends on his analogies. He looked into different ones and compared them to find the best one. Good analogies are hard to find. The best ones sound obvious when you hear them but can hard to find. The Bee Keeker analogy for open source software took me months of iterations based on years of experience to come up with. It sounds fairly obvious when you hear it, but there was no analogy before it to help understand the model.

If an analogy is good enough will allow you to infer additional knowledge. If you follow Bezos’ electricity analogy, and look at history you can draw additional insights. Looking at the history of electricity adoption, we can draw inferences about the adoption of cloud computing (with some generalizations) :

  • Before the introduction of electricity supply as a commodity service, any large company needing electricity had its own electricity generators.
  • Before the introduction of cloud-based computing with utility pricing, any large company needing computing had its own data center.
  • Who were the first people to join the electricity grid? Small companies and residences without prior electrical supply.
  • Who were the first people to use cloud computing? Small companies and individuals without data centers.
  • Who were the last people to join the electricity grid? Large companies with their own power sources.
  • Who will be the last people to migrate to cloud computing? Large companies with their own data centers.

Looking in the press you can see that the majority of the anti-cloud talk comes from larger enterprises.

However, most companies that have started in the last five years are evolving cloud-based infrastructures. As a start-up you typically have desktop-based applications for accounting, HR, CRM etc. As you grow it makes sense to move to hosted solutions like Net Suite, SalesForce, SugarCRM etc. As you add more and more hosted solutions the cost and headache of installing and maintaining on-premise solutions looks less and less attractive.

So today’s generation of small comanies, which will become large companies in the future, have four classes of applications:

  • Domain-specific desktop applications
  • Generic applications with small scale usage (e.g. project planning)
  • Generic applications that will grow and become cloud based (payroll, CRM, or accounting in a small company)
  • Cloud-based applications

If companies, as Dan suggests, are using services other than Amazon for critical applications, then Amazon is failing in its mission due to operational issues. Jeff Bezos is not likely to let that continue for long.

Written by James

November 12, 2014 at 6:00 pm

Posted in Uncategorized

Extending Pentaho Analyzer and CDF

leave a comment »

Here is a sneak peek at some of the things I’ll be showing at Pentaho World sessions later this week (11am Oct 10th).

floorplan GPS trails and stationary indicators heatmap Screen Shot 2014-04-25 at 4.10.06 PM Screen Shot 2014-09-26 at 2.47.54 PM Screen Shot 2014-09-26 at 3.59.36 PM Screen Shot 2014-09-11 at 3.11.07 PM

Written by James

October 6, 2014 at 8:32 pm

Posted in Uncategorized


Get every new post delivered to your Inbox.

Join 811 other followers