I listened to this podcast on my drive to Michigan last Friday. It’s really, really great. If you like to think about the future, and where the puck is going, put your headphones on and listen critically.
What is revealed is the way machine learning can be used to create biased and unbiased conclusions. It’s always been known that if you start with the wrong hypothesis when using statistical analysis, you will reach a bad conclusion. There is a humorous blog, Spurious Correlations, that turns statistics on its head.
The real danger to me is because we have powerful machine learning and can harness it to run hundreds and hundreds of calculations over the same data, researchers will feel more confident of their predictive powers. However, if they don’t really really take care to set up their experiments and samples in an unbiased manner, those predictive powers will be wrong. Really terrible policy or decisions that harm innocent people could be the result.
It seems machine learning is Janus like. Two sides to the coin. It can be really great and make human existence better. Or, it could be used in nefarious ways to influence all kinds of outcomes. There is an old saying, “lies, damned lies, and statistics” and it makes sense the way some people try and use statistics.
Additionally, if you have had exposure to statistics via Stats 101, 102, this is an excellent primer to get you thinking about it again. It will refresh your memory. When I think about statistics, I always get a bit of a tremor in my gut. It’s because nothing is really certain. It’s all extrapolation. P values, random samples, how you sample, and all of that can throw wrenches into analysis. Concentrating on the wrong variables-or the right variables. It’s all very tricky.
When I see polls today, I hardly believe them. I remember Nate Silver of 538 tweeting that when a pollster uses “likely voters”, there is no standardization from poll to poll what that even means. This means that all the results are probably biased in some way.
Our Presidential polls are all conditional probability at this point. It’s awfully hard to tease out what is really going on, and predict how people are going to act. If a pollster was honest, they’d tell you they have no idea. The Brexit vote ought to be example A for that. To be clear, I don’t think the Brexit vote has any correlation to the Presidential race. However, the underlying anti-establishment sentiment in both are similar.
For the Presidential race, what I’d like to see is a current state by state poll taken with generic D vs generic R, then overlay the most recent polls. Anyone I know that has come out strong for either candidate wasn’t voting for the other side anyway. We like to think we are unbiased, but we aren’t.