Mitt Romney with Kid Rock
Maybe you should just plug your iPod into the booth or connect the Diebold machine to Facebook this November. It started as an office joke, but after running the numbers, I can’t escape the data. It turns out music preference is pretty well correlated with political affiliation.
A few highlights:
- Republicans seem to have less diverse music taste than Democrats
- Kenny Chesney fans are most likely to swing right, Rihanna fans left.
- Metal fans could save us all from a two-party system.
Over the past few years, we here at The Echo Nest have put a lot of engineer energy into “Taste Profiles” – our server-side representation of everything a listener does with music. We first released them in 2010, then we linked them to our playlist API with catalog-radio, and recently released a large amount for research. Anyone can store their musical activity across services in our “cloud” from metadata, audio files or fingerprints, and update stats like play counts, skips, ratings, loves and bands. The Taste Profile ID can then do all sorts of stuff – everything from recommending you new music, to syncing your local collection to a cloud service, to suggesting shows to see, to some of the crazy stuff I’ll walk through below.
Quite a lot of our API developers and customers are already using Taste Profiles to manage the musical identity of their customers, and we use them internally as the proving ground for a lot of our analytics work. For example, we spent the last few months making sure people listen to our automated radio service (used in iHeartRadio, Spotify, VEVO, MOG and many others) as long as possible by predicting how they’ll respond to our suggestions. And with today’s release of new Taste Profile key-values you can now annotate a Taste Profile with any information you want (such as your location, the device you’re using, your IDs on social networks or anything else) and we’ll use it to give you better results. Part of the push behind arbitrary data in a musical identify is to track how “non-musical behaviors” can make our results better. If you live in Sweden, maybe you don’t want to hear ABBA anymore. Or you’ve just seen a Wes Anderson film — we might want to send you down a Kinks path.
We’ve been collecting this (completely anonymized of course) data for a while now and started looking into what correlations exist between music, psychographics, demographics and other media preferences. As the time is upon us in the US to start thinking about who to elect as president, we thought we’d first look to see what political affiliation data we had and if it had any correlation to music. Can we tell if someone is a Republican just from his or her iTunes collection? And if so, which artists are the key “tells” for both sides? What we found was fascinating.
Predicting politics from Taste Profiles
Barack Obama with BB King
Some couching notes before we get started: although we have far more data, we’ll only look at listeners that self-report as either “Democrat-aligned” or “Republican-aligned” and are living in the US, to make it a quicker read. The political alignment was automatically derived from annotations of political figures or parties, we grouped prominent political figures such as Bill or Hillary Clinton, Barack Obama, John Kerry and so on as Democrat alignment, as would be a specific party affiliation. Likewise for republican: George Bush, Mitt Romney, John McCain, Sarah Palin, and so on. If someone listed both or had conflicting affiliations we did not include them in this experiment, nor did we consider any other party or country. Finally, throughout this I’ll call the two classes “Democrats” and “Republicans,” even though that’s an incredibly generalized version of the available data.
So let’s get to it: let’s take a bunch of Taste Profiles, see which ones have political affiliation listed and then try to learn the relationship between the musical data in the Taste Profile and the affiliation.
What kind of musical data are we talking about here? An Echo Nest Taste Profile can be as simple as a list of artists the listener likes or can include very detailed information about his or her listening activity. The Echo Nest then of course has tens of thousands of points of data to associate with each artist or song. These data points are as mundane as the name of the artist (“Carly Rae Jepsen”) to as complex as the number of millseconds in between each downbeat (4508), or the predicted key (E major), or the probabilities of words people use to describe the artist on the internet (“angular,” “stupid,” “witch house.”) We use all of this data to recommend you music on MTV.com or play you a great station on iHeartRadio, and here we’re going to use it to see if you like big government.
Cultural vectors used in Echo Nest musical analysis, here for “ABBA”
For every person’s Taste Profile, we have many thousands of terms that describe the kind of music the person is into. For this experiment, half of the Taste Profiles’ worth of musical term data is thrown into The Echo Nest’s statistical machine learning classifiers (we have our own custom stuff that mostly acts as a very large scale multi-class support vector machine.) and associated with the “ground truth” of affiliation to try to learn a model of each class. In layman’s terms, this basically means we show the system a bunch of examples of Democrat Taste Profiles and a bunch of Republican ones and see if it can predict the class on a new, unknown Taste Profile. Our machine learning tech is good at handling messy data like this— we’ve added a lot of math and magic on top to deal with our specific kind of musical data.
After we’ve learned the model we can then test it by giving it the other half of the data — asking our classifiers to identify each previously unknown Taste Profile as each Democrat or Republican — to see how well it does at prediction. We use a few measures to evaluate the experiment:
- raw accuracy (out of all the test examples, how many did we get right)
- precision (out of the ones we predicted in the class, how many were in that class)
- recall (out of all the examples of that class, how many did the classifier find) and
- F1, a blended measure of precision and recall that people in this field like to use as a general performance metric.
Looks like we’ve got something here! We hit an F1 of over 0.8 for Republican prediction and just under 0.4 for Democrat prediction. These are both good numbers, in line or above many other prediction algorithms. But why is our Democrat prediction less than half as accurate as our Republican prediction? Shouldn’t they be the same?
Political affiliation is not binary, and it’s not like we can assume that just because someone didn’t explicitly list an affiliation with the Democratic party that they are voting for Romney. And it turns out the correlation between musical preference and Democrat affiliation is slightly harder to tease out. When I was going over the data I had a theory: Republicans might listen to fewer kinds of music. If most of the class you are trying to predict stays within a narrow range of music types, they’re easier to spot. Conversely, if the class you’re predicting is all over the musical map, it becomes harder to make accurate predictions. And the data shows it. If we add up the occurrence of each musical term associated with each person (for example: Joe listens to “rock” at “110 bpm” that sounds like “Aerosmith” or “the 70s” and is voting for Romney, we mark a +1 under each of those terms for the Republican bucket) and then plot the histogram counts in descending order, we see a clear difference in both the magnitude and distribution of musical types for the two political affiliations.
Histogram counts of the top occurring musical terms for each class.
Not only are there less musical types overall, but the clear almost right angle “elbow” of the Republican histogram distribution shows that after a small set of top ranking terms, their listenership tends to have far less musical diversity. The Democrat curve is smoother, indicating that those people listen to more types of music overall. Overall, for every 10 unique musical types Democrats listen to, Republicans listen to just 7.
As silly as it might seem to predict political affiliation with music, this is a great example of why we do this kind of analysis. Now that we’ve found room for improvement in predicting a class by looking at diversity as a feature, we can apply this in our work to make our playlists and recommendations better. We’re currently adding a “diversity index” to our Taste Profile back end to model this exact effect.
Who are Kenny Chesney fans voting for?
Another fun thing to look with these political classifiers is the inputs that the machine thought were the most predictive of political affiliation. For each class, you can do some quick math on the model by inspecting the margin. This tells us which musical terms and properties best separate each political class:
In this diagram of the support vector machine in action, the circles on the dotted lines are the support vectors where training examples are used to define the classification boundaries.
Since a list of probabilities of music vectors isn’t exactly good blog material, I found the closest matching artists to each set of terms to show here. We get a great list from doing this:
Artists whose fans are most correlated to Republican
- 1. Kenny Chesney
- 2. George Strait
- 3. Reba McEntire
- 4. Tim McGraw
- 5. Jason Aldean
- 6. Blake Shelton
- 7. Shania Twain
- 8. Kelly Clarkson
- 9. Pink Floyd
- 10. Elvis Presley
Artists whose fans are most correlated to Democrat
- 1. Rihanna
- 2. Jay-Z
- 3. Madonna
- 4. Lady Gaga
- 5. Katy Perry
- 6. Snoop Dogg
- 7. Chris Brown
- 8. Usher
- 9. Eminem
- 10. Bob Marley
The Democrat high predictive list reminds me of another experiment I ran: taking the Billboard Top 10 and feeding it through the predictor. As of this writing the top artists are Carly Rae Repsen, Maroon 5, Gotye, Katy Perry, Rihanna, Ellie Goulding, fun., Nicki Minaj, David Guetta and Usher — almost all artists that skew very strongly Democratic. If only people that buy singles in music stores vote this November, it will be a complete Obama landslide.
Classifier confusion prediction
Lastly, I thought the “highest confusion” artist list was interesting. These artists are not good predictors of either Republican or Democrat, so if you like them, you’re relatively safe from this Minority Report shit.
Artists whose fans are hardest to predict for either Democrat or Republican
- 1. The Beatles
- 2. Marilyn Manson
- 3. The Rolling Stones
- 4. Johnny Cash
- 5. Pantera
- 6. Alice in Chains
- 7. Paradise Lost
- 8. Moonspell
- 9. Fleetwood Mac
- 10. Tiamat
I found it neat these non-predictive artists were mainly metal. Perhaps the genre that can finally bring this divided country together or break the lock on the two party system.
OK, now what?
Obviously, The Echo Nest is not going to quit our day jobs to become pollsters. But it’s fascinating how much information about you is sitting inside your musical tastes, and we also appreciate how we can use these experiments to tune our models to make your listening experience better on everything we power (Clear Channel’s iHeartRadio, eMusic, MOG, Spotify, Nokia, the BBC, VEVO, and many more.) We have a lot more ideas around this angle. Stay tuned to this blog for more experiments, and please let me know if you’ve got any great ideas.
Huge thanks to proofreaders / reviewers Dan Ellis, Eliot van Buskirk, Jim Lucchese
All data used in this experiment (and all Echo Nest Taste Profile data used in analytics) was carefully anonymized before we received it. ↩
Please get in touch if you’ve got a great idea for another experiment on data like this that I don’t cover! ↩
Perhaps fit for another blog post, but if you’re reading down here: it turns out that the musical data the Echo Nest has is crucial for the experiment to work. Just training a model against artist or song names alone did not work nearly as well. F1s dropped on average of 20–40% depending on the task. ↩
In practice we use a somewhat “reduced” form of the SVM known as RLSC so that we can easily scale across many machines, but the intent is the same. You can read about RLSC on music data in an old paper of mine if you’re interested. (PDF) ↩
We trained two binary classifiers: Rep vs. Dem and Dem vs. Rep. The training data chose a large random sample from the universe of data, which was 66% Dem and 34% Rep. ↩
To make this histogram chart, we normalized both bucket counts first by the universe of terms to ensure this bias in the priors would not affect the distribution. ↩
One day, Echo Nest will start delineating between “good” Pink Floyd (Syd) and the other kind ↩