Is your movie and music preference related?

Heart of glassHeart of Glass

I’m a music person: I’m a musician, I pack up all my life experiences through the lens of records and bands, and I’ve spent 15 years of my life building the world’s best automated music recommender. I think there’s something terribly personal about music that other forms of “media” (books, movies, television, articles and – recent entry alert – applications) can’t touch. A truly great song only takes a minute and forty four seconds to experience, and then you can hit the repeat button. I can hear “Outdoor Miner” 31.7 times on my walk to work every morning if I wanted to. But I can’t watch one of my favorite movies, Werner’s “Heart of Glass,” even once on my walk to work, and to be honest, more than once a year is a bit much. I’d have to be staring at my phone or through some scary glasses. And it’s a distracting world, far too much to fit into the diorama of the brain: dozens of actors, scenes, sounds, props and story. I don’t know if I attach memories or causal emotion to movies: they try to explicitly tell me how to feel, not suggest it obliquely or provide a soundtrack to a reality. And worst of all, it’s a mood killer to give a fledgling romantic partner a mix “DVD-box-set.”

But certainly, my preference in film (or that I even call them films – like some grad student) has to tell me something about myself, or even my other tastes. If we knew someone’s movie preference, could we build a better music playlist for them? Or can we help you choose a movie by knowing more about your music taste? I recently poked out of my own bubble of music recommendation and music retrieval to see if there were any correlations we could make use of.

Recommending in general

//platform.twitter.com/widgets.js

The way the Echo Nest has done music recommendation is actually quite novel and deserves a quick refresher: we don’t look at what most other companies or technologies do. Amazon, Last.fm, iTunes Genius and many others use statistics of your activity to determine what you like: if you listen to Can, and so does a stranger, but that stranger also loves Cluster and the system presumes you don’t know about them, you might get recommended Cluster. But that doesn’t know anything about music and it constantly fails in its own naïve way:

Britney vs. Powell
Colin Powell recommendation from Britney Spears

Instead of relying on that brittle world of correlated activity, we’ve first built a music understanding system that gets what music is: what people say about it and what it sounds like, and that platform also happens to recommend things and give you playlists. We use all of that data to power top rate discovery for tons of services you use every day: Rdio, Sirius XM, Nokia’s MixRadio, iHeartRadio, MTV, the Infinite Jukebox. We don’t just know that you like a song, we know what the key of that song is, how many times people called it “sexy” in the past week on blogs, and what instruments are in it. We also know, through the anonymized Taste Profile: how often you, and the world, listened, what time of day, and what songs you like to listen to before and after and how diverse your taste is.

The reason this is useful is we don’t want to just build a thing that knows that “people that like The Shins also like Garden State,” we want to go deeper. We want our models to understand the underlying music, not just the existence of it. We also want to show correlations between styles and other musical descriptors and types of films, not just artists. Facebook could (and it probably tries to) build a music “recommender” by just checking out the commonalities of what people like, but we want to look deeply at the problem, not the surface area of it.

Experimental setup

The Echo Nest is currently pulling in hundreds of musical activity data points a second, through our partners and our large scale crawls of the web and social media. A recent push on our underlying Taste Profile infrastructure nets us new data on the listeners themselves – specifically, with anonymously collected and stored demographic and non-music media preferences. Through all of this we know the favorite artists and movies for a large set of Taste Profiles (if you’re a developer, you can store non-musical data using our Taste Profile Key-Value API and manipulate and predict new features using our alpha Taste Profile predict API.) For the purposes of this experiment, we limited our world to 50,000 randomly chosen Taste Profiles that had movie and music preference data.

Musical attributes for ABBA
Musical attributes for ABBA

Each artist was modeled using Echo Nest cultural attributes: a sparse vector of up to 100,000 “terms” that describe the music in the Taste Profile, weighted by their occurrence. If someone constantly listens to the new James Holden record, and I mean, over and over again, kind of annoyingly, we weight terms like “bedroom techno” and “melodic” along with the acoustically derived terms – its energy, danceability and so on – higher than songs they’ve just heard once or twice. The output vector is a human-targeted cultural description of their favorite music, with helpful floating point probabilities P(X|L) for each term denoting: “How likely would it be for this listener to describe their taste as ‘X’”?

The movie data was a bit harder, noting for the record that we are a music data provider run by some musicians who happened to be good with computers. I deployed a small crack team (the CTO and his imaginary friends) to build a mini “Echo Nest for movies,” cataloging (for now) 5,000 of the most popular films along with their descriptors culled from descriptions and reviews in a similar way as we’ve done for music. I determined their genres, lead actors, key attributes and cultural vectors to train models against.

Top movie attributes for "The Godfather"
Movie attributes for The Godfather

Predictions

By training thousands of correlative models between the sparse music vectors and the various target ground truth of the movie attributes (which were in reality far less diverse and dense) we are able to quickly view high affinity between various types of music and types of movies.

KL divergence doing its thing, from Wikipedia
KL divergence doing its thing

I used a multi-class form of the Support vector machine, regularized least-squares classification, which you can read about in an old paper of mine to train the thousands of models. RLSC is fine with sparse vectors and unbounded amounts of output classes, and we also ended up with a linear kernel which made the training step very light – likely due to the low rank of the movie features.

I evaluated the models in two ways: the first I’ll call a “discriminant classifier” – this will list the most useful sources of information (KL divergence) for a given music source, and the second is a “ranked classifier” – given popularity features, what would give the least surprise for the classifier. There’s good reasons for the two methods: the former is more statistically correct, but ignores that most people have never heard of most things, while the latter gives us safe bets that give less explicit information.1 As we see every day with music, a computer’s idea of “information” rarely has little to do with things like the success of “Fast & Furious 6.”

For example, I am able to ask it both: “If an average person likes Jay-Z, what are their favorite movies” (ranked) and “Which movie can I assume predicts the liking of Jay-Z”? (discriminant). They are:

Ranked Discriminant
Toy Story
Step Brothers
Buddy The Elf
Harry Potter (series)
Jackass
Superbad
Fight Club
Get Rich or Die Tryin’
Paid in Full
Scary Movie 4
Shottas
Juice
New Jack City
Friday After Next

Movie predictions for fans of Jay-Z

You can see the difference: the left side is the safe bets (everyone likes Toy Story! everyone likes Jay-Z!) and the right side is the less known but more useful results. So you don’t think I’m pulling a Shawshankr2 on you, here’s the list for a different artist:

Ranked Discriminant
Dirty Dancing
Toy Story
The Blind Side
Twilight (series)
The Notebook
Finding Nemo
Dear John
Pure Country
8 Seconds
Country Strong
Valentine’s Day
Sweet Home Alabama
Letters to Juliet
The Vow

Movie predictions for fans of Tim McGraw

We can also bulk this up by features of the movie, here are the top musical artists correlated with movies with a heavy crime element:

Ranked Discriminant
Jimi Hendrix
The Beatles
The Rolling Stones
Jay-Z
The Who
Bob Dylan
Pink Floyd
Ghostpoet
Amazing Blondel
Ian Anderson
Doseone
Young Gunz
Mandrill
Pato Banton

Artist predictions for fans of crime movies

Seeing the Amazing Blondel there just amazes me: we track two and a half million artists and it’s those guys that like crime movies? The data can’t lie.

The Amazing Blondel

We also looked up which movies our term computations considered “pornographic” or “adult” (they know it when they see it:)

Ranked Discriminant
Linkin Park
The Beatles
The Rolling Stones
Deftones
Limp Bizkit
Korn
Rage Against the Machine
The Receiving End of Sirens
Haste the Day
The Dillinger Escape Plan
The Mars Volta
Far * East Movement
Rediscover
Imogen Heap

Artist predictions for fans of adult movies

Fans of “Christian metalcore”-rers Haste the Day and Imogen Heap, we’re onto you. We don’t judge.

Overall

We did a lot more analysis, more of which you can see over on The Echo Nest’s new Musical Identity site, including breakdowns of different genres of films:

Sci-fi vs. Fantasy
Sci-fi vs. Fantasy

The goal of all of this is to understand more about music and the identity we express through our affinity. We’re getting closer with a lot of these large scale analyses of different forms of media and demographic and psychographic predictions from solely preference. But it’s also going to help us elsewhere: being able to recommend you that one song or artist with not much information is what we do, and the more we can predict from what we used to think of as orthogonal sources, the better.


  1. For the scientists getting mad: the ranked classifier applies a smoothed weight of terms by their document frequency – the number of times we saw a movie being mentioned. 

  2. The more precise movie recommender with the worst recall 

How music recommendation works — and doesn’t work

When you see an automated music recommendation do you assume that some stupid computer program was trying to trick you into something? It’s often what it feels like – with what little context you get with a suggestion on top of the postmodern insanity of a computer understanding how should you feel about music – and of course sometimes you actually are being tricked.

Amazons recommendations for Abbey Road

Amazon’s recommendations for Abbey Road

No one is just learning that if they like a Beatles album, they may also like five others. Amazon is not optimizing for the noble work of raising independent artists’ profiles to the public, and they’re definitely not optimizing for a good musical experience. They’re statistically optimizing to make more money, to sell you more things. Luckily this is the fruit fly of music recommendation, the late night infomercial quality of a music discovery experience that also might dry your lettuce if you spin it fast enough. And I doubt Amazon would ever claim otherwise.

The rest of the field has gotten pretty far since then and we’ve now got tons of ways to discover music using actual qualities of the music or social cues of what your friends are listening to. But I still hate seeing examples like the above. I hate thinking there are forces at work in music discovery that don’t have listeners’ best interests at heart and I want to make them better. I want to walk through all the ways music recommenders work or don’t, and concentrate of course on the one I know best – The Echo Nest’s – which you’re probably using even if you don’t already know it. And most importantly, I want to talk about what we can do next.

Before I get into it, a brief history of who I am: I’ve been working on music recommenders and music retrieval since 1999, academically and in industry. In 2005 I started The Echo Nest with my co-founder Tristan. We power most online music services’ discovery using a very interesting series of algorithms that is sort of the Voltron-figure of our two dissertations and the hard work of our 50 employees in Boston, SF, NYC and London. And we’ve been on a bit of a tear – just in the past year alone we’ve announced that we’re powering music discovery features for eMusic, Twitter, EMI, iHeartRadio, Rdio, Spotify, VEVO and Nokia – with some new heavy hitters yet announced – to add to our existing customer base that includes MTV, MOG, and the BBC. And through our API we have tens of thousands of developers making independent apps like Discovr, KCRW, Muzine, Raditaz, Swarm, SpotON and hundreds more.

We’ve been a quiet company for a while and with all this great news comes a lot of new confusion about what we do and how it compares to other technologies. Journalists like to pin us as the “machine” approach to understanding music next to the “human” of our nearest corollary (not competitor) in the space – Pandora. This is somewhat unfair and belies the complexity of the problem. Yes, we use computer programs to help manage the mountains of music data, but so does everyone, and the way we get and use that data is just as human as anything else out there.

I’ll go into technologies like collaborative filtering, automatic content based recommendation, and manual approaches used by Pandora or All Music Guide (Rovi). I’ll show that no matter what the computational approach ends up being, the source data – how it knows about music – is the most important asset in creating a reliable useful music discovery service.

What is recommendation? What is it good for?

Musicians are competing for an audience among millions of others trying just as hard. And it’s not the listener’s fault if they miss out on something that will change their lives – these days, anyone can gain access to a library of over 15 million songs on demand for free. To a musician turned computer scientist (as I and so many of my colleagues are) this is the ultimate hidden variable problem. If there was something “intelligent” that could predict a song or artist to a person, both sides (musician and listener) win, music is amazing, there’s a ton of data, and it’s very far from solved.

But anyone in the entire field of music technology has to treat music discovery with respect: it’s not about the revenue of the content owner, it’s not about the technology, it’s not about click through rates, listening hours or conversion. The past few years have shown us over and over that filters and guides are invaluable for music itself to coexist with the new ways of getting at it. We track over 2 million artists now – I estimate there are truly 50 million, most of them currently active. Every single one of them deserves a chance to get their art heard. And while we can laugh when Amazon suggests you put a Norah Jones CD in your cart after you buy a leaf blower, the millions of people that idly put on Pandora at work and get excited about a new band they’ve never heard deserve a careful look. Recommendation technology is powering the new radio and we have a chance to make it valuable for more than just the top 5 percent of musicians.

When people talk about “music recommendation” or “music discovery” they usually mean one of a few things:

  • Artist or song similarity: an anonymous list of similar items to your query. You can see this on almost any music service. Without any context, this is just a suggestion of what other artists or songs are similar to the one you are looking at. Formally, this is not truly a recommendation as there is no user model involved (although since a query took the user to the list, I still call these a recommendation. It’s a recommendation in the sense that a web search result is.)
  • Personalized recommendation: Given a “user model” (your activity on a service – plays, skips, ratings, purchases) a list of songs or artists that the service does not think you know about yet that fits your profile.
  • Playlist generation: Most consumers of music discovery are using some form of playlist generation. This is different from the above two in that they receive a list of items in some order (usually meant to be listened to at the time.) The playlist can be personalized (from a user model) or not, and it can be within catalog (your own music, ala iTunes Genius’s or Google’s Instant Mix playlists) or not (Pandora, Spotify’s or Rdio’s radio, iHeartRadio.) The playlist should vary artists and types of songs as it progresses, and many rely on some form of steering or feedback (thumbs up, skips, etc.)
Where popular services sit in discovery
Personalized Anonymous
Playlist Pandora Rdio radio
Suggestions Amazon All Music Guide

These are three very different ways of doing music discovery, but for every technology and approach I know of, they are simply applications on top of the core data presented in different ways. For example, at the Echo Nest we do quite a bit to make our playlists “radio-like” using our observed statistics, acoustic features and a lot of QA but that sort of work is outside the scope of this article – all three of our similarity API, taste profiles (personalized recommendation) API and playlist API start with the same knowledge base culled from acoustic and text analysis of music.

However, the application means a lot to the listener. People seem to love playlists and radio-style experiences, even if the data driving both that and the boring list of songs to check out are the same. One of the great things about working at the Echo Nest is seeing the amazing user interfaces and experiences and people put on top of our data. Listeners want to hear music, and they want to trust the service and have fun doing it. And conversely, a Pandora completely powered by Echo Nest data would feel the same to users but would have far better scale and results and thus add to the experience. Because of this very welcome sharding of discovery applications, it’s helpful less to talk about these applications directly and more to talk about “what the services know” about music – how they got to the result that Kreayshawn and Uffie are similar, no matter where it appeared in the radio station or suggestion or what user model led them there. We can leave the application and experience layer to another lengthy blog post.

My (highly educated, but please know I have no direct inside information except for Echo Nest of course) guesses on the data sources are:

How popular services know about music
Service Source of data
Pandora Musicologists take surveys
Songza Editors or music fans make playlists
Last.fm Activity data, tags on artists and songs, acoustic analysis[1]
All music guide Music editors & writers
Amazon Purchase & browsing history
iTunes Genius Purchase data, activity data from iTunes[2]
Echo Nest Acoustic analysis, text analysis

There are many other discovery platforms but this list covers the widest swath of approaches. Many services you interact with use either these platforms directly (Last.fm, Echo Nest, AMG all license data or give away data through APIs) or use similar enough approaches that it’d be not worth going into in detail.

From this list we’re left with a few major music knowledge approaches: (1) activity data, (2) critical or editorial review, (3) acoustic analysis, and (4) text analysis.

The former two are self-describing: you can learn about music by the activity around it (listens, plays, purchases) – Kreayshawn and Uffie are considered similar if the same people buy their singles or rate them highly – or you can learn about music by critical review, what humans have been pretty good at for some time. Of course, encoding activity (via collaborative filtering or taste mining) and critical review (via surveys or direct entry) into a database is a relatively recent art.

The latter two, acoustic and textual analysis, were developed by the field as a reaction to the failures of the first two. I’ll go into much greater detail on those as it’s how Echo Nest does its magic.

Care and Scale

The dominating principle of the Echo Nest discovery approach from day one has been “care and scale.” When Tristan and I started the company in 2005 we were two guys with fresh PhDs on music analysis and some pretty good technological solutions; Tristan’s in the acoustic analysis realm (a computer taking a signal and making sense of it) and mine in the data mining and language analysis space (understanding what people are saying and doing with music.) We surveyed the landscape at the time for discovery and found that almost every one suffered from either a lack of care or scale, sometimes (and often) both. The entire impetus of doing a startup (not an easy choice for two scientists and anyone that has met us knows we are not the “startup type”) was that we thought we had something between the two of us that could fix those two problems.

Care & Scale

Scale is easy to explain: you have to know about as much music as possible to make good recommendations. If you don’t know about an up and coming artist, you can’t recommend them. If you only analyze or rate or understand the popular stuff you by default fail at discovery. Manual discovery approaches by their nature do not scale. We track over two million artists and over 30 million songs and there is no way a manually curated database can reach that level of knowledge. Even websites that can be volunteer or community edited run against the limits of the community that takes part – we count only a little over 130,000 artist pages on Wikipedia. Pandora recently crossed the 1 million song barrier, and it took them 10 years to get there. Try any hot new artist in Pandora and you’ll get the dreaded:

Pandora not knowing about YUS

This is Pandora showing its lack of scale. They won’t have any information for YUS for some time and may never unless the artist sells well. This is bad news and should make you angry: why would you let a third party act as a filter on top of your very personal experiences with music? Why would you ever use something that “hid” things from you?

Activity data approaches (such as Last.fm and Amazon and iTunes Genius) also suffer from a slighter scale problem that manifests itself in a different way. It’s trivial to load a database of music into an activity data-based discovery engine (such as collaborative filtering or social tags.) I’ve often gone after such naive approaches to music discovery publicly. If a website or store has a list of user data (user A bought / listened to song Y at time Z) any bright engineer will immediately go into optimization mode. There’s almost a duplicitous ease of recommending music to people poorly. I recently was shopping for a specific type of transistor for a project on a parts and components website and found they, too, had turned on the SQL join that allowed “recommendations” on their site based on activity data:

Pathological filtering

Pathological filtering

Other than activity not making much sense in a discovery context, by default these systems suffer a “popularity bias,” where a lot of music simply doesn’t have enough activity data yet collected to be considered a recommendation match. Activity based systems can only know what people have told them explicitly, and this often makes it hard for less-popular artists to be recommended.

Care is a trickier concept and one we’ve tried very hard to define and encode into our engineering and product. I translate it as “is this useful for the musician or listener?”A great litmus test for care in music discovery is to check the similar artists or songs to The Beatles. Is it just the members of the Beatles and their side projects? For almost all services that use musical activity data, it will be:

Top artist similars are all members of the Beatles

Top artist similars are all members of the Beatles

Certainly a statistically correct result[3], but not a musically informative one. There is so much that user data can tell you about listening habits, but blindly using it to inform discovery belies a lack of care about the final result. Care is neatly handled by using social, manual or editorial approaches, as humans are pretty good at treating music properly. But when using more statistical or signal processing approaches that know about more music at scale, care has to be factored in somehow. Most purely signal processing approaches (such as Mufin here) fall down as badly on care as activity data approaches do:

Mufin expressing so little care about Stairway to Heaven

Mufin expressing so little care about Stairway to Heaven

Care is a layer of quality assurance, editing and sanity checks, real-world usage and analysis and, well, care, on top of any systematic results. You have to be able to stand by your results and fix them if they aren’t useful to either musicians or listeners. Your WTF count has to be as low as possible. We’ve spent a lot of time embedding care into our process and while we’re always still working, we’re generally pleased with how our results look.

Without both care and scale you’ve got a system that a listener can’t trust and that musician can’t use to find new fans. You’ve failed both of your intended audiences and you might as well not try at all.

Care & Scale of common approaches

Text Analysis

Echo Nest Cultural vectors

Echo Nest Cultural vectors

I started doing music analysis work in 1999 at the NEC Research Institute in Princeton, NJ (I had scammed them into an internship and then a full time job by being very persistent.) NEC was then full of the top tier of data mining, text retrieval, machine learning and natural language processing (NLP[4]) scientists; I had the great fortune to work with guys like Steve Lawrence, Gary Flake, David Waltz and even Vladimir Vapnik moved into my tiny office after I left for MIT.

I was there while figuring out what to do with myself after abruptly quitting my PhD program in NLP at Columbia. I was a musician at the time, playing a lot of shows at various warehouse spaces or the lamented late “Brownies,” places where 20 people might show up and 10 would know who you were. There was a lot of excitement about “the future of music” – far more than there is today, as somehow we felt that the right forces would win and quickly. I logged onto Napster for the first time from a DSL connection and practically squealed in delight as a song could be downloaded faster than the time it would take to listen to it. It was a turning point for music access, but probably a step back for music discovery. We were still stuck with this:

Napster in 2001

Napster in 2001

The search was abysmal: a substring match on ID3v1 tags (32 characters each for artist, title, release and a single byte for genre) or filename (usually “C:MUSICMYAWES~1RAPSONG.MP3”) and there was no discovery beyond clicking on other users’ names and seeing what they had on their hard drives. I would make my music available but of course, no one would ever download it because there was no way for them to find it. A fellow musician friend quickly took to falsely renaming his songs as “remixes” by better known versions of himself: “ARTIST – TITLE – APHEX TWIN MIX” and reported immediate success.

At the time I was a member of various music mailing lists, USENET groups and frequent visitor of a new thing called “weblogs” and music news and review sites. I would read these voraciously and try to find stuff based on what people were talking about. To me, while listening to music is intensely private (almost always with headphones alone somewhere), the discovery of it is necessarily social. I figured there must be a way to take advantage of all of this conversation, all the excited people talking about music in the hopes that others can share in their discovery – and automate the process. Could a computer ‘read’ all that was going on across the internet? If just one person wrote about my music on some obscure corner of the web, then the system could know that too.

This is scale with care: real people feeding information into a large automated system from all different sources, without having to fill out a survey or edit a wiki page or join a social network.

After almost ten years of data mining, language and music research (first at NECI, then a PhD at MIT at the Media Lab) The Echo Nest currently is the only music understanding service that takes this approach. And it works. We crawl the web constantly, scanning over 10 million music related pages a day. We throw away spam and non-music related content through filtering, we try to quickly find artist names in large amounts of text and parse the language around the name. Every word anyone utters on the internet about music goes through our systems that look for descriptive terms, noun phrases and other text and those terms bucket up into what we call “cultural vectors” or “top terms.” Each artist and song has thousands of daily changing top terms. Each term has a weight associated, which tells us how important the description is (roughly, the probability that someone will describe music as that term.) We don’t use a fixed dictionary for this, we are able to understand new music terms as quickly as they are uttered, and our system works in many Latin-derived languages across many cultures.

On top of this statistical NLP, we also pull in structured data from a number of partners and community access sites like Wikipedia or Musicbrainz. We apply the same frequency and vector approach to this knowledge-base style data: if Wikipedia lists the location of an artist as NYC and the label partner as New York, NY and their Facebook page has “EVERYWHERE ON TOUR 2012”, we have to figure out which is the right answer to index. Often the cultural vectors on structured data become a synthesis of all the different data sources.

When a query for a similar artist or a playlist comes into our system, we take the source artist or song, grab its cultural vectors, and use those in real time to find the closest match. This is not easy to do at scale, and over the years we’ve done quite a lot of “big data” work to make this tractable. We don’t cache this data because it changes so often – the global conversation around music is very finicky and artists make overnight changes to their sound.

A lot of useful data naturally falls out of cultural analysis of music: the quantity of conversation is used to inform our “hotttnesss” and familiarity data points, representing how popular the artist is now on the internet and overall how well known they might be. We can use the crawled text anonymously as sort of a proxy for listener data without having to get it from a playback service. And the index of documents that relate to artists or songs is of course valuable to a lot of our customers in a feed or search context – showing news or reviews about artists their users are interested in.

Acoustic Analysis

Echo Nest acoustic analysis view

Echo Nest acoustic analysis view

The internet is not the Library of Babel we envision it to be, and quite often many “lower-rank” (less popular) musicians are left out of the “cultural universe” we crawl. Also, the description of music necessarily leaves out things that actually describe the music – a Google blog search for Rihanna illustrates the problem well: many popular artists’ descriptions are skewed towards the celebrity angle and while this is certainly a valid thing to know about a musician, it’s not all we need to know. Lastly, internet discussion of music tends to concentrate on artists, not songs (although there is sometimes talk of individual songs on music blogs.) These three issues (and common sense) require us figure out if we can understand how a song sounds as well as how the artist and song is represented by listeners. And if we are going to follow care and scale, we’ve got to do this automatically, with a computer doing the job of the careful listening.

Can a computer really listen to music? A lot of people have promised it can over the years, but I’ve (personally) never heard a fully automated recommendation based purely on acoustic analysis that made any sense – and I’ve heard them all, from academic papers to startups to our own technology to big-company efforts. And that has a lot to do with the expectations of the listener. There are certain things computers are very good and fast at doing with music, like determining the tempo or key, or how loud it is. Then there are harder things that will get better as the science evolves, like time signature detection, beat tracking over time, transcription of a dominant melody, and instrument recognition. But even if a computer were to predict all of these features accurately, does that information really translate into a good recommendation? Usually not – and we’ve shown over the years that people’s expectation of “similar” – either in a playlist or a list of artists or songs – trends heavily towards the cultural side, something that no computer can get at simply by analyzing a signal.

But it does turn out that acoustic analysis has a huge part to play in our algorithms. People expect playlists to be smooth and not jump around too much. Quiet songs should not be followed with loud metal benders (unless the listener asked for that.) For jogging, the tempo should steadily increase. Most coherent mixes should keep the instrumentation generally stable. Songs should flow into one another like a DJ would program them, keeping tempo or key consistent. And there’s a ton we haven’t figured out yet on the interface side. Could a “super dorky query interface” work for music recommendation, where a listener can filter by dominant key or loudness dynamics? Maybe with the right user experience. An early product out of the Echo Nest[5] was an “intelligent pause button” that Tristan whipped up that would compose a repeating segment out of the part of the song you were in or just play the song roughly forever (check out an automated 10 minute MP3 re-edit of a Phoenix song) – which a few years later became Paul’s amazing Infinite Jukebox – these experiments are fascinating precursors to a new listening experience that might become more important than discovery itself.

The Echo Nest audio analysis engine (PDF) contains a suite of machine listening processes that can take any audio file and outputs both low-level (such as the time of when every beat starts) and high-level (such as the overall “danceability”) information for any song in the world. We analyze all the music we work with, and developers can upload their own audio to see everything we compute on a track via our API. Our analysis starts by pretending it was an ear: it will model the frequencies and loudness of a musical signal much the same way perceptual codecs like MP3 or AAC do. It then segments the audio into small pieces – roughly 200ms to 4s, depending on how fast things are happening in the song. For each segment we can tell you the pitch (in a 12-dimensional vector called chroma), the loudness (in an ADSR-style envelope) and the timbre, which is another 12 dimensional vector that represents the sound of the sound – what instruments there are, how noisy it is, etc. It also tracks beats across the signal, in subdivisions of the musical meter called tatums, and then per beat and bar, alongside larger song-level structure we call sections that denote choruses, intros, bridges and verses.

That low level information can be combined through some useful applications of machine learning that Tristan and has team have built over the years to “understand” the song at a higher level. We emit song attributes such as danceability, energy, key, liveness, and speechiness, which aim to represent the aboutness of the song in single floating point scalars. These attributes are either heuristically or statistically observed from large testbeds: we work with musicians to label large swaths of ground truth audio against which to test and evaluate our models. Our audio analysis can be seen as an automated lead sheet or a computationally understandable overview of the song: how fast it is, how loud it gets, what instruments are in it. The data within the analysis is so fine grained that you can use it as a remix tool – it can chop up songs by individual segments or beats and rearrange them without anyone noticing.

We don’t use either type of data alone to do recommendations. We always filter the world of music through the cultural approaches I showed above and then use the acoustic information to order or sort the results by song. A great test of a music recommender is to see how it deals with heavy metal ballads – you normally would expect other ballads by heavy metal bands. This requires a combination of the acoustic and cultural analysis working in concert. The acoustic information is obviously also useful for playlist generation and ordering, or keeping the mood of a recommendation list coherent.

What’s next

I’ve used every single automated music recommendation platform, technology or service. It’s obviously part of my job and it’s been astounding to watch the field (both academically and commercially) mature and test new approaches. We’ve come a long way from RINGO and while the Echo Nest-style system is undoubtedly the top of the pack these days as far as raw quality of automated results go, there’s still quite a lot of room to grow. I’ve been noticing two trends in the space that will certainly heat up in the years to come:

Social – filtering collaborative filtering

This is my jam

This is my jam

Social music discovery as embodied by some of my favorite music services such as This is my jam, Swarm.fm, Facebook’s music activity ticker and real time broadcasting services like the old Listening Room and its modern progentior Turntable.fm often have no or little automated music discovery. Friend-to-friend music recommendations enabled by social networks are extremely valuable in music discovery (and I personally rely on them quite often), but are not recommendation engines as they can not automatically predict anything when bereft of explicit social signals (and so they fail on scale.) The main amazing and useful feature of recommendation systems are that it can find things for you that you wouldn’t have otherwise come across.

Even though that puts this kind of service outside the scope of this article, that doesn’t mean we should ignore the power of social recommendations. There’s something very obvious in the social fabric of these services that makes personal recommendations more valuable: people don’t like computers telling them what to do.

There are some interesting new services that mix both the social aspects of recommendation with automated measures. This is my jam’s “related jams” is a first useful crack at this, as is Spotify’s recent “Discover” feature. You can almost see these as extensions of your social graph: if your friends haven’t yet caught onto Frank Ocean there might be signals that show they will get there soon, and using cultural filtering can get us there. And a lot of the power of social recommendations – that it comes from your friends – can tell the story better than just a raw list of “artists we think you would like.”

Listener intelligence

When do you listen to music? Is it in the morning on your way to work? Is it on the weekends, relaxing at home? When you do it, how often do you listen to albums versus individual songs in a playlist? Do you idly turn on an automated radio station, or have your own playlists? Which services do you use and why? If it’s raining, do you find yourself putting on different music? The scariest thing about all the music recommendation systems I’ve gone over (including ours as of right now) is none of them look at this necessary listener context.

There is a lot going on in this space, both internal stuff we can’t announce yet at Echo Nest, and new products I’m seeing coming out of Spotify and Facebook. We’re throwing our weight behind the taste profile – the API that represents musical activity on our servers. It’s sort of the scrobble 2.0: both representing playback activity as well as all the necessary context around it: your behaviors and patterns, your collection, your usage across services and maybe even domains. We’re even publishing APIs to do bulk analysis of the activity to surface attributes like “mainstreamness” or “taste-freeze,” the average active year of your favorite artists. This is more than activity mining as collaborative filtering sees it: it’s understanding everything about the listener we can, well beyond just making a prediction of taste based on purchase or streaming activity. All of these attributes and analysis might be part of the final frontier of music recommendation: understanding enough to really understand the music and the listener it’s directed to.

– Brian (brian@echonest.com)

Thanks to EVB for the edits


  1. I know they do a ton of acoustic analysis there but I don’t know if it’s used in radio or similar artists/songs. Probably.  ↩

  2. They use Gracenote’s CDDB data as well but I don’t know to what extent it appears in Genius.  ↩

  3. You can even claim that since you are listening to the Beatles, you are also listening to the individual members at the same time but let’s not get too ahead of the curve here  ↩

  4. Whenever I say NLP I mean the real NLP, the natural language processing NLP, not the creepy pseudoscience one.  ↩

  5. Released eventually as the Earworm example in Echo Nest Remix.  ↩

Music data talk at Velocity EU

I gave a talk at Velocity EU in London a couple of weeks ago on how some pieces of the Echo Nest work, operationally:

//speakerdeck.com/assets/embed.js

(Direct link on speakerdeck)

We started Echo Nest seven years ago and I couldn’t imagine a stranger time to begin a technology-focused venture. We started with 2 self-built 2U rack machines in a closet at MIT and within two years started moving everything to first dozens and then almost thousands of virtualized hosts running on a few different cloud providers. And then a year or two ago we pulled it all back to physical again. We’re in a weird spot between offline data processor and real time API provider and none of the oft-repeated hype platforms ever worked for us.

We’ve been around before and during much of Hadoop, Solr, “NoSQL”, EC2, API-as-PR, the rise of mobile, Apple on Intel, Hacker News and sharded mySQL as a key-value store. And we’re still at it, making money in an industry that would rather keep it to itself. If it’s not obvious, I’m really proud of what we built and the small team that works their ass off to keep it working while making some really cool new stuff.

There’s a lot of stories to tell and this is just the first one, on how we’ve abused text indexing to do quite a lot of our work. If you ever wondered how some of the musical data souffle avoids burning, here it is. Thanks to John & Kellan for giving me a chance to present this.

1980s pen plotters of the future

Early 1980s pen plotters are amazing tools that are still very useful in today’s world. There’s something completely transfixing about a mechanical device moving an actual pen on paper versus the smelly black box of the laser printer. And if you’re trying to draw lines or curves or like the effect of actual ink touching paper (not sprayed on in microdots) there’s no other way. Luckily, there’s some great tools out there for making plotters work on modern hardware and using modern file formats (PDFs) and the hardware itself, while finicky and aging, is cheap.

Hardware

You’re going to have to start with the right hardware. The Chiplotle! plotter FAQ is great for this, I’ve added my notes below:

  • USB – serial interface or a serial port on your computer. I have trouble with Chiplotle! with the very common Prolific serial adapters (it might be the drivers) but if you have something that works it’ll probably be fine. I use an ex-Keyspan one.
  • Any HPGL compatible pen plotter. eBay is almost always your best bet, unless you live near the MIT Flea Market. Make sure the plotter supports HPGL via serial connection. If it says HPIB or GPIB, do not get. Very common eBay finds are the Roland DXYs or the HP 7475a. If in doubt, check the Chiplotle! list of supported plotters (although keep in mind similar model #s will also work; for example the DXY-1150 and 1150a act as a DXY1300 in Chiplotle!.) I normally pay $50 or so for a single sheet plotter like the DXY-1150. Make sure you get a power supply with the plotter, that’s often the hardest thing to find and none of them use anything standard.
  • A plotter serial cable. The only place you’ll need to pull out a soldering iron. You can buy plotter serial cables on eBay but it’s easier to just make your own from a DB25 male to DB9 female cable. You have to re-route a few wires but it’s easy to do.
  • Pens & paper. Your eBay plotter likely will come with a plastic bag full of dried out pens or, if you’re lucky, boxed ones. There’s a huge variety of pen types (for different paper or thicknesses, felt vs. fine point) or you can fashion your own if you’re wily. For paper, the average desktop plotter can take up to 11 x 17" paper or just normal printer paper (make sure to set the DIP switches on the back if you don’t use the full size.) I have a nice stack of artist vellum paper with nice vellum pens.

Have all that? Now, find a place to put the plotter, run the plotter cable from the serial port on back of the plotter to your USB adapter, and load up some paper. Depending on the plotter, the paper might be held in electrostatically or it may be magnetized and expect a metal tab to hold in the paper (which you obviously did not get in the eBay shipment, you can use a metal ruler or something similar instead.) Or maybe you will have to tape the paper on. If it’s a wide format plotter the paper will have to be in a roll and the vertical axis is done via the roller mechanism like a receipt printer (these are notriously fiddly, I would avoid them unless you really need 3 foot wide paper.) You then load up a pen (in the pen holder off to the side, not the plotter head – the beginning of every HPGL file tells the plotter to pick up the pen from the holder.) Now, let’s plot.

Software

In 2008 I was bitten by the plotter bug all of a sudden. I was trying to draw a smooth bezier curve robotically and was looking at various servo or motor solutions when I stumbled on the community of folks that have adapted Roland pen plotters into vinyl cutting CNC machines. I found myself intensely bidding on my first plotter against a familiar eBay username. After I lost, I confirmed my suspicions: I was in competition with my dear friend Douglas Repetto of CMC & dorkbot fame. And not only was he also independently plotter crazed, he was working on a Python module for HPGL control called, well, Chiplotle!. Maybe there was something in the water that week.

Chiplotle!

Chiplotle is obviously the best and only way to reliably control a plotter from a modern computer. It does quite a lot of work for you: it manages the commandset of your plotter, buffers output so it doesn’t overflow and start drawing random straight lines, provides a interactive terminal where you can “live draw” and a bunch of other necessary stuff. Although you can import chiplotle into your program and programatically control your plotter, I tend to use just one commandset of Chiplotle – the plot_hpgl_file script that it installs.

HPGL files are like the wizened ancestor of PDF. It is simply an ASCII file of text commands to draw lines, curves, choose pens, etc and is the plotter’s native language. If you want, you can ignore Chiplotle! altogether and just cat an HPGL file to your serial port at 9600 baud, 8N1. This will work fine for the first few commands but eventually the plotter’s internal buffer (mine is 512 bytes) will overflow. plot_hpgl_file takes care of all of this. The first time you run it, it will attempt to detect which plotter you have on which serial port. Then it will slowly spit out the HPGL commands and make sure the plotter is acknowledging them.

My workflow for the project I am on now is to generate PDF files programatically using the amazing ReportLab python PDF toolkit and then convert them to HPGL using pstoedit and plot it. It is as simple as

python your_pdf_generator.py file.pdf
pstoedit -f hpgl file.pdf > output.hpgl
plot_hpgl_file.py output.hpgl

Obviously I could have my python program directly control the plotter, but as ink and paper add up, you will want a step in between to make sure your art is OK, and the PDF step is natively viewable on any platform. Since PDF and HPGL both share a lot of common ancestry, the curveTos, lineTos and moveTos are kept consistent with no loss of quality. There’s no rasterizing step: if you generate curves programatically with ReportLab, they will be the same curve on the paper in the plotter.

Owls

I spent a good six hours building this Apple //c Information Kiosk. It’s hilarious! You can type in any band name (that existed 1984 or earlier OF COURSE) and it shows you a picture and their bio. Or in automatic mode just scrolls through a set of pop artists around that time (via this amazing Echo Nest API call.

You can run it yourself if you have the right hardware, here’s my HOWTO.

The Echo Nest “puddle” and artist entity extraction

(Cross posted from my post at the Echo Nest blog, with additions)

It’s the year 2006 and co-Founder Brian and early Nest developer Ryan were trying to figure out how to associate the world of free text on the internet to musical artists. We already were crawling tens of thousands of documents a day (now millions!) but a Google-style index of unstructured text about music was not our goal. We needed to somehow quickly associate a new incoming page to an artist ID so that we could quickly retrieve all the documents about an artist as well as run our statistics on the text to find out what people were saying. Brian sketched this classic diagram, soon to be placed in the Echo Nest Museum (next to SoundCloud’s award):

That crudely drawn “blob of intelligence” that could take unstructured free text and quickly identify artist names quickly became known as “the Puddle,” a term that entered Echo Nest lore alongside “grankle” and “flat.” We use a form of the Puddle to this day. Every piece of text that our crawlers generate goes through a custom entity extraction process– it’s how we know what blogs are writing about which artists and it’s what powers our artist similarity engine, as we need to figure out what people are saying about which artists as soon as it’s said. It’s a powerful and fast changing piece of our infrastructure trying to attack a hard problem.

Entity extraction is even more useful today. If you wanted to build a Twitter app that figured out the bands a user was talking about, how could you do it? You’d need a huge database of artists (check, we have over 1.6 million), a lot of fast computers (check), and tons of rules learned from our customers over the years about artist resolution– aliases, stopwords, tokenization, merged artists and so on. Given a simple tweet:

Can we figure out what band Brian’s talking about, automatically? Well, now you can. We’ve decided to open up a beta version of our entity extraction toolkit called artist/extract to developers. Pass in any text and you’ll get back a list of artist names (in order of appearance by default, but you can sort by any Echo Nest feature) that was mentioned in the text. Think of it as a form of artist search that can take anything – Facebook comments, tweets, blog posts, reviews, SMSes.

We support all sorts of fancy things to help you. We know that “Led Zep” is an alias for Led Zeppelin. We try to deal with common word band names via capitalization rules. You can of course detect multiple artists in the same block of text. And you can use personal catalogs and Rosetta Stone to limit results to music your user owns or is playable by our partners Rdio or 7digital. And you can add the standard buckets – hotttnesss, familiarity and so on – to get information about the artists all in one call.

This is beta and still has some issues. It lags a little behind our internal entity resolving for performance reasons, and things like this can never be perfect. But it’s very helpful. Some ideas we have batting around:

  • Suggest bands to Facebook users using our new Facebook Rosetta service and by parsing their comments for band names
  • Recommend Twitter followers based on the music they talk about
  • Play a radio stream for any blog using our playlist APIs by parsing their posts for artist names
  • See how “indie” your friends are by computing average hotttnesss of all the bands they mention in email

Paul made a great demo if you want to see it in action:

Enjoy. Let us know if you have any issues!

The Echo Nest Musical Fingerprint (ENMFP)

Tomorrow begins MHD Amsterdam and at it The Echo Nest is releasing a few new things. Some of our engineering team (who deserve a severe callout for all their work, let me stick with their codenames) have been working tirelessly to get “songs” to be a first-class member of our API, and as of today, they are – we now track many millions of songs and you can query for them by name and receive all sort of useful metadata, get similar songs (with amazing results even very deep in the catalog), and even get free (legal) playable audio for a huge collection of major label content (more on this later.) As part of this push to provide data about songs, we have been working on a music fingerprint– a way to resolve an unknown audio file (what we call a “track”) to a large database to identify it in our world (as a “song.”) And we’re ready to release this to the community to see how it performs in the wild.

The design goals of our FP were to base it on Echo Nest audio features, to make it simple to implement and to make it as open as possible. Lock in of content resolution data is a terrible thing, and a large part of The Echo Nest’s focus is to make it easy for people to figure out what their music is about without getting stuck in ID space hell. If you have an iTunes collection and want to automatically make Spotify playlists, we should be able to help you. If you write an app that scans your hard drive for tracks to make great recommendations against MOG or the Limewire store, we should be able to help you. If you want the tempo of every song in someone’s terribly labeled iPod library, we should be able to help you. A fingerprint to us is a utility call– like our search_artists – a way to resolve a music identifier to our set of ID spaces. Echo Nest song IDs, if you choose to use them, give you all of our stuff “for free” – from a single EN SO ID you can get recommendations, artist pictures and bios, blog posts, record reviews, and of course all the audio analysis: the tempo, key, events in the song. But over this year we are rolling in support for any other ID space via Rosetta Stone, so you will be able to return Spotify IDs or get last.fm URLs of the song from the fingerprint. Our goal as always is to be the bridge between music and amazing applications– a platform for music intelligence that lets anyone use any service on any audio to discover and interact with music.

How it works

Our fingerprint is called the Echo Nest Musical Fingerprint (ENMFP) and is based directly on parts of our audio analysis engine that already powers tons of interactive music and music search apps across the globe. We get a detailed understanding of what is happening in a song (note: a song, not just an audio file) for “free” simply by having Tristan be our co-founder, so our work on the ENMFP started there. We worked with audio scientists on ways to scalably hash parts of the analysis and query for “codes” – a sequence of numbers that can match the same song to the ear. We identified an efficient series of transformations of our low level segment description data to make a very accurate code, and our engineering team built a suite of tests, backend servers, and a query API. The ENMFP comes in two parts. The code generator is a binary library that you can compile into your own app. It takes in a buffer of PCM samples (in practice, give it around 20 seconds of 22050Hz mono float PCM), runs a series of signal processing algorithms on the samples, and returns a list of codes. It is as simple as

    Codegen * pCodegen = new Codegen(_pSamples, _NumberSamples, offset);
    for (uint i=0;i<pCodegen->getNumCodes();i++)
        printf("%ld ", pCodegen->getCodes()[i]);

The server maintains a canonical list of songs with corresponding codes and performs fast lookup. We’ve based the server on some popular open source indexing and storage platforms, and we’ll be releasing our modifications to them as a reference implementation shortly.

Use and open nature

Almost all of this implementation is open. The data behind the server is open by design. Anyone can request full data dumps. Anyone that wants to run their own server can provided that they mirror with the other servers. The only non-normative license is in the code generator, which for now is binary-only, available for most platforms (Windows, Linux 32 & 64-bit, Mac OS X, mobile forthcoming) and free to use in any sort of application – commercial, open source, free, webapp, etc. The only pertinent restriction is that codes are sent to only “authorized servers.” The design of this license ensures that one party does not attempt to usurp the ID resolving space out from under anyone. If The Echo Nest dissolves or gets bought by a large fish cannery on accident, we want to make sure the data and query service live on without us. As a corollary, we don’t want anyone “hiding” new resolved tracks from the ID space. Anyone that collects new songs via this fingerprint has to share their data, plain and simple. This hopefully ensures that over the years the combined knowledge from all uses of the ENMFP will catalog every single piece of music available on the internet, and the data will be available to all. We want the ENMFP to grow into a public internet utility.

Features

  • The ENMFP looks at the underlying music, not just the raw audio signal. This gives it some unique advantages:
    • Unlike many FPs, is robust to time scaling
    • Can identify sample use in mixed audio
    • Can identify remixes, live versions and sometimes cover versions
  • Can identify a song in <20s of audio
  • Can also match on track metadata (artist name, title, length) using Echo Nest name matching in the same call
  • Server and some of the code generator are completely open source
  • Data is completely open; dumps provided, mirroring required to host your own server (we want people to boot their own copies of the data)

Anti-features

  • In heavy alpha, not heavily QA’d yet, help wanted
  • Not completely OSS: the code generator relies on proprietary EN algorithms. Binaries provided, free to use, but not open source.
  • No ingestion API yet (you are querying against a large but not complete catalog, there is no way currently to add new songs. This is changing soon. If you maintain a large catalog and want it in our reference database, please get in touch.)

How to use

First, you need an Echo Nest developer API key if you don’t already have one. Next, familiarize yourself with the alpha_identify_song API. (As of right now, before we release the server source, the Echo Nest is hosting the only query server via this API.) There is instructions there on how to receive the libcodegen binaries. The libcodegen package also ships with an example code generator that you can call from the commandline, so no worries if you aren’t ready to do some compiling.

How to help

We see the ENMFP as a community project just getting started. If you are interested in booting your own mirror server, or if you have experience with FP tasks, want to help with QA, automated testing, have a large catalog to ingest or test against, please get in touch.



we are especially grateful for the work of Unrepentant Nagios Installer (UNI), Guy Who Fights With Me About the Word “Track” Every Fucking Day (GWFWMAWTEFD), Drinks Turret Coolant (DTC), Mr. HTML5 Canvas 2010 (HC2), So-Glad-I-Kept-You-Out-Of-The-Media-Lab (SGIKYOOTML), Skinny Tie (ST), Main Ontology Offender (MOO), Future Performable Employee (FPE), and of course Ben Lacker (BL)