The Big Data Blind Alley

My friend Notre Dame Law Professor Laura Hollis turned me on to a new blog. I was scrolling through it and his post on gambling caught my eye.  In it, he talked about a system he discovered for betting on the second half of NFL football.  Because of changes in NFL rules, his system doesn’t work anymore.  But, the salient point of the blog post was this:

For every nugget of gold I found, probably 30 or so theories turned out to be false leads pointing to fool’s gold.  That’s the excruciating toil of data mining, the labor that no one sees.  It’s spending half a day or longer than that on something that looks very promising dating back a few seasons, and then when you continue to run the numbers with all the crosschecking, eventually the advantages fizzle out and end up at the same random percentages as coin flipping.  That’s why it’s called mining.  You have to go deep underground, dig through an incalculable amount worthless rock, and if you’re extraordinarily persistent and then lucky, you might just find a few tiny diamonds amidst the coal.  Data mining is an exercise in constant frustration and disappointment, not to be attempted by anyone but the most determined and stubborn.

There are a lot of startup companies doing data mining now. Data mining is a commodity.  It’s how you transform the data into usable bits of information that really allow insights that you couldn’t see before.  The context.  Sometimes it’s how you dig.  Things like data mining and artificial intelligence are going to be table stakes for any company if they want to succeed very soon.

4 thoughts on “The Big Data Blind Alley

  1. Thanks for the mention, Mr. Carter. I plan on writing considerably more about the process of “data mining” in my next chapter. This work consumed most of my time last fall, producing mixed results. Will elaborate in more detail ASAP. Best Regards. — Nolan Dalla

    1. Thanks, love your blog. I think a lot of people would like to learn more about the mechanics of data mining. A lot of it is in the assumptions before you mine. Creates different outcomes with same data.

Comments are closed.