hits counter

Brian Whitman @ variogr.am Brian Whitman @ variogr.am

Scroll to Info & Navigation

Talk about A Singular Christmas at the Automatic Music Hackathon

I gave a talk about my A Singular Christmas at the Automatic Music Hackathon last week. Here’s what it looked like and what I said.

A Singular Christmas

Pretend that you’re new here, and you want to know what a bird is. You’re lucky: lots of people know what a bird is. They can show you a bird. This is Hilary Putnam’s linguistic division of meaning, semantic externalism. If you see enough things labeled, Bird, you start to get a handle on what makes a bird a bird. They’ve got a beak or a certain color, they land on a branch, they spread their wings and fly.

A Singular Christmas

The way I’ve ever understood anything is by endlessly imagining all its forms and presentations. Watch what’s similar and what surprises you. See enough of the same thing, and you can make a little machine to describe it. Snowflakes maybe are circles, except when they’re not. Sometimes fractal edges, sometimes straight, sometimes a number describing the fractalness. Sparkles on the edges, a water droplet from the microscope? So any new snowflake is a set of machines you can add up. Circle plus fractal edge plus sparkles equals your own snowflake.

A Singular Christmas

We’ve all done this. We treat pictures like this, movies, touch. And of course every sound you hear these days is a series of multipliers of a basis function, spit out a speaker so fast you can’t hear the buzz. Add a bunch of component bits together to get your creativity or expression. Rehydrate the vectors into a speaker or screen again, and you probably don’t even notice.

A Singular Christmas

It’s in Pentland’s eigenfaces, so many years ago. You probably walked in the path of a dozen cameras trying this trick on your own face on the way over here tonight. Your phone has it built in, tries to tell if you’re smiling or maybe if you’re someone the government should know about.

A Singular Christmas

But it fails more often than you can imagine. Vision guys call this registration. For a computer to get what something is, it’s got to line up. Keep the eyes in the same pixel. If someone is bigger than someone else. Or an outlier, like Facebook deciding your fishbowl is your grandmother. This is where we’re still better. We don’t normally confuse people with objects, and you only need to do that once. It turns out computers like skipping over repetitive things, and we appreciate those. It turns out computers get confused by loud noises.

A Singular Christmas

I try to make this work better. I like when it fails, often, better than when it works really well. The algorithm annealing into a steady state has to be our culture’s greatest art. That we even had the hubris to encode our senses into a square floating point matrix of numbers. And that we even think that representation is good enough to understand the underlying thing.

A Singular Christmas

I mostly do it with music. People know pretty much everything about every song ever, and there’s databases where you can get the pitch of the tenth guitar note, and what people said about it. Imagine the entire universe describing a song. And then you have the audio, too, and some computer-understandable description of all the events in the song.

A Singular Christmas

I’ve been doing it for a while, this was 2003, a thing called “Eigenradio.” it took every radio station I could get in a live stream at the time, at once, and figured out how to do basis computation and resynthesis in a sort of live stream back. The idea was to be “computer music.” Not music made on a computer, because everything is. But music for computers. What they think music actually is. It mostly sounded like this:

A Singular Christmas

It took a lot of effort to do something like this. I taught myself how cluster computing worked, and scammed MIT into spending far too much money on something would be a free tier on a cloud provider these days. The power kept going out. But the project was my favorite kind of irony, the one where the joke is nowhere near as funny as the reality it pokes at.

A Singular Christmas

I have this whole other life that I’m not going to get into, but it involves knowing about music. Consider Christmas song detection. Thought experiment: imagine someone that doesn’t know Christmas, and you play them a bunch of Christmas songs, will they see a connection? Is there something innately Christmas about the music? Bells? Wide open melodies like a rabbit hopping on a piano? My theory was, if I could synthesize Christmas music from an analysis of all the Christmas music I could find, and people thought it sounded Christmas-y, we’ve cracked the code, we can have a Singular Christmas.

A Singular Christmas

A Singular Christmas

Do you want to know the magic trick? But doing this taught me one important lesson: synthesis is just fast composition. Computer people love to hate themselves because everything is so easy. But we all make things, often beautiful things, even if we didn’t mean to. Even if “the data did it” or you just threw a bunch of Matlab functions together or it only started sounding good when you started panning the sine waves into different channels. You’re composing.

A Singular Christmas

A Singular Christmas

This thing got everywhere. By far the most successful creative thing I’ve done. I was on the BBC on Christmas Eve, exasperatedly spelling out “eigenanalysis.” Pitchfork reviewed it, I got 4 stars. The MIT sysadmins and I had a big fight over its bandwidth. This excited Canadian man, on the radio.

A Singular Christmas

My favorite things are the emails. Every December, right around now, they start slowly rolling in. How this album is the only thing they listen to during the holidays. How it means Christmas to them. I’m still working on this stuff, as a sort of hedge against my more mundane realities. I want to show the world there’s beauty in the act of understanding.

Very large scale music understanding talk @ NAE Frontiers

A few years ago I gave this talk at the very impressive NAE "Frontiers of Engineering" conference via invitation of my more successful academic friends, and noticed they had published the transcript. A rare look at one of the reasons The Echo Nest exists, from my perspective:

Presented at NAE Frontiers of Engineering, 2010

Front

Scientists and engineers around the world have been attempting something undeniably impossible— and yet, no one could ever question their motives. Laid bare, the act of “understanding music” by a computational process feels offensive. How can something so personal, so rooted in context, culture and emotion, ever be discretized or labeled by any autonomous process? Even the ethnographical approach — surveys, interviews, manual annotation — undermines the raw effort by the artists, people who will never understand or even perhaps take advantage of what is being learned and created with this research. Music by its nature resists analysis. I’ve led two lives in the past ten years— first as a “very long-tail” musician and artist, and second as a scientist turned entrepreneur that currently sells “music intelligence” data and software to almost every major music streaming service, social network and record label. How we got there is less interesting than what it might mean for the future of expression and what we believe machine perception can actually accomplish.

In 1999 I moved to New York City to begin graduate studies at Columbia working on a large “digital government” grant, parsing decades of military documents to extract the meaning of the acronyms and domain specific words. At night I would swap the laptops in my bag and head downtown to perform electronic music at various bars and clubs. As much as I tried to keep them separate, the walls came down between them quickly when I began to ask my fellow performers and audience members how they were learning about music. “We read websites,” “I’m on this discussion board,” “A friend emailed me some songs.” Alongside the concurrent media frenzy on peer to peer networks (Napster was just ramping up) was a real movement in music discovery— technology had obviously been helping us acquire and make music, but all of a sudden it was being using to communicate and learn about it as well. With the power of the communicating millions and the seemingly limitless potential of bandwidth and attention, even someone like me could get noticed. Suitably armed with an information retrieval background alongside an almost criminal naiveté regarding machine learning and signal processing I quit my degree program and began to concentrate full time on the practice of what is now known as “music information retrieval.”

The fundamentals of music retrieval descend from text retrieval. You are faced with a corpus of unstructured data: time-domain samples from audio files or score data from the composition. The tasks normally involve extracting readable features from the input and then learning a model from the features. In fact, the data is so unstructured that most music retrieval tasks began as blind roulette wheels of prediction: “is this audio file rock or classical” [Tzanetakis 2002] or “does this song sound like this one” [Foote 1997]. The seductive notion that a black box of some complex nature (most with hopeful success stories baked into their names— “neural networks,” “bayesian belief networks,” “support vector machines”) could untangle a mess of audio stimuli to approach our nervous and perceptual systems’ response is intimidating enough. But that problem is so complex and so hard to evaluate that it distracts the research from the much more serious elephantine presence of the emotional connection underlying the data. A thought experiment: the science of music retrieval is rocked by a massive advance in signal processing or machine learning. Our previous challenges in label prediction are solved— we can now predict the genre of a song with 100% accuracy. What does that do for the musician, what does that do for the listener? If I knew a song I hadn’t heard yet was predicted “jazz” by a computer, it would perhaps save me the effort of looking up the artist’s information, who spent years of their life defining their expression in terms of or despite these categories. But it doesn’t tell me anything about the music, about what I’ll feel when I hear it, about how I’ll respond or how it will resonate with me individually and within the global community. We’ve built a black box that can neatly delineate other black boxes, at no benefit to the very human world of music.

The way out of this feedback loop is to somehow automatically understand reaction and context the same way we could with perception. The ultimate contextual understanding system would be able to gauge my personal reaction and mindset to music. It would know my history, my influences and also understand the larger culture hovering around the content. We are all familiar with the earliest approaches to contextual understanding of music — collaborative filtering, a.k.a. “people who buy this also buy this” [Shardanand 1995] — and we are also just as familiar with its pitfalls. Sales or activity based recommenders only know about you in relationship to others— their meaning of your music is not what you like but what you’ve shared with an anonymous hive. The weakness of the filtering approaches become vivid when you talk to engaged listeners: “I always see the same bands,” “there’s never any new stuff” or “this thing doesn’t know me.” As a core reaction to senselessness of the filtering approaches I ended up back at school and began applying my language processing background to music— we started reading about music, not just trying to listen to it. The idea was that if we could somehow approximate even one percent of the data that communities generate about music on the internet— they review it, they argue about it on forums, they post about shows on their blog, they trade songs on peer to peer networks— we could start to model cultural reaction at a large scale. [Whitman 2005] The new band that collaborative filtering would never touch (because they don’t have enough sales data yet) and acoustic filtering would never get (because what makes them special is their background, or their fanbase, or something else impossible to calculate from the signal) could be found in world of music activity, autonomously and anonymously.

Alongside my co-founder, whose expertise is in musical approaches to signal analysis [Jehan 2005], I left the academic world to start a private enterprise, “The Echo Nest.” We are now thirty people, a few hundred computers, one and a half million artists, over ten million songs. The scale of this data has been our biggest challenge: each artist has an internet footprint of on average thousands of blog posts, reviews, forum discussions, all in different languages. Each song is comprised of thousands of indexable events and the song itself could be duplicated thousands of times in different encodings. Most of our engineering work is in dealing with this magnitude of data— although we are not an infrastructure company we have built many unique data storage and indexing technologies as a byproduct of our work. The set of data we collect is necessarily unique: instead of storing the relationships between musicians and listeners, or only knowing about popular music, we compute and aggregate a sort of internet-scale cache of all possible points of information about a song, artist, release, listener or event. We began the company with the stated goal to index everything there is about music. And over these past five years we have built a series of products and technologies that take the best and most practical parts from our music retrieval dissertations and package them cleanly for our customers. We sell a music similarity system that compares two songs based on their acoustic and their cultural properties. We provide tempo, key and timbre data (automatically generated) to mobile applications and streaming services. We track artists’ “buzz” on the internet and sell reports to labels and managers.

The core of the Echo Nest remains true to our dogma: we strongly believe in the power of data to enable new music experiences. Since we crawl and index everything, we’re able to level the playing field for all types of musicians by taking advantage of the information given to us by any community on the internet. Work in music retrieval and understanding requires a sort of wide-eyed passion combined with a large dose of reality. The computer is never going to fully understand what music is about, but we can sample from the right sources and do it often enough and at a large enough scale that the only thing in our way is a leap of faith from the listener.

References

Is your movie and music preference related?

Heart of glassHeart of Glass

I’m a music person: I’m a musician, I pack up all my life experiences through the lens of records and bands, and I’ve spent 15 years of my life building the world’s best automated music recommender. I think there’s something terribly personal about music that other forms of “media” (books, movies, television, articles and — recent entry alert — applications) can’t touch. A truly great song only takes a minute and forty four seconds to experience, and then you can hit the repeat button. I can hear “Outdoor Miner” 31.7 times on my walk to work every morning if I wanted to. But I can’t watch one of my favorite movies, Werner’s “Heart of Glass,” even once on my walk to work, and to be honest, more than once a year is a bit much. I’d have to be staring at my phone or through some scary glasses. And it’s a distracting world, far too much to fit into the diorama of the brain: dozens of actors, scenes, sounds, props and story. I don’t know if I attach memories or causal emotion to movies: they try to explicitly tell me how to feel, not suggest it obliquely or provide a soundtrack to a reality. And worst of all, it’s a mood killer to give a fledgling romantic partner a mix “DVD-box-set.”

But certainly, my preference in film (or that I even call them films — like some grad student) has to tell me something about myself, or even my other tastes. If we knew someone’s movie preference, could we build a better music playlist for them? Or can we help you choose a movie by knowing more about your music taste? I recently poked out of my own bubble of music recommendation and music retrieval to see if there were any correlations we could make use of.

Recommending in general

The way the Echo Nest has done music recommendation is actually quite novel and deserves a quick refresher: we don’t look at what most other companies or technologies do. Amazon, Last.fm, iTunes Genius and many others use statistics of your activity to determine what you like: if you listen to Can, and so does a stranger, but that stranger also loves Cluster and the system presumes you don’t know about them, you might get recommended Cluster. But that doesn’t know anything about music and it constantly fails in its own naïve way:

Britney vs. Powell
Colin Powell recommendation from Britney Spears

Instead of relying on that brittle world of correlated activity, we’ve first built a music understanding system that gets what music is: what people say about it and what it sounds like, and that platform also happens to recommend things and give you playlists. We use all of that data to power top rate discovery for tons of services you use every day: Rdio, Sirius XM, Nokia’s MixRadio, iHeartRadio, MTV, the Infinite Jukebox. We don’t just know that you like a song, we know what the key of that song is, how many times people called it “sexy” in the past week on blogs, and what instruments are in it. We also know, through the anonymized Taste Profile: how often you, and the world, listened, what time of day, and what songs you like to listen to before and after and how diverse your taste is.

The reason this is useful is we don’t want to just build a thing that knows that “people that like The Shins also like Garden State,” we want to go deeper. We want our models to understand the underlying music, not just the existence of it. We also want to show correlations between styles and other musical descriptors and types of films, not just artists. Facebook could (and it probably tries to) build a music “recommender” by just checking out the commonalities of what people like, but we want to look deeply at the problem, not the surface area of it.

Experimental setup

The Echo Nest is currently pulling in hundreds of musical activity data points a second, through our partners and our large scale crawls of the web and social media. A recent push on our underlying Taste Profile infrastructure nets us new data on the listeners themselves — specifically, with anonymously collected and stored demographic and non-music media preferences. Through all of this we know the favorite artists and movies for a large set of Taste Profiles (if you’re a developer, you can store non-musical data using our Taste Profile Key-Value API and manipulate and predict new features using our alpha Taste Profile predict API.) For the purposes of this experiment, we limited our world to 50,000 randomly chosen Taste Profiles that had movie and music preference data.

Musical attributes for ABBA
Musical attributes for ABBA

Each artist was modeled using Echo Nest cultural attributes: a sparse vector of up to 100,000 “terms” that describe the music in the Taste Profile, weighted by their occurrence. If someone constantly listens to the new James Holden record, and I mean, over and over again, kind of annoyingly, we weight terms like “bedroom techno” and “melodic” along with the acoustically derived terms — its energy, danceability and so on — higher than songs they’ve just heard once or twice. The output vector is a human-targeted cultural description of their favorite music, with helpful floating point probabilities P(X|L) for each term denoting: “How likely would it be for this listener to describe their taste as ‘X’”?

The movie data was a bit harder, noting for the record that we are a music data provider run by some musicians who happened to be good with computers. I deployed a small crack team (the CTO and his imaginary friends) to build a mini “Echo Nest for movies,” cataloging (for now) 5,000 of the most popular films along with their descriptors culled from descriptions and reviews in a similar way as we’ve done for music. I determined their genres, lead actors, key attributes and cultural vectors to train models against.

Top movie attributes for "The Godfather"
Movie attributes for The Godfather

Predictions

By training thousands of correlative models between the sparse music vectors and the various target ground truth of the movie attributes (which were in reality far less diverse and dense) we are able to quickly view high affinity between various types of music and types of movies.

KL divergence doing its thing, from Wikipedia
KL divergence doing its thing

I used a multi-class form of the Support vector machine, regularized least-squares classification, which you can read about in an old paper of mine to train the thousands of models. RLSC is fine with sparse vectors and unbounded amounts of output classes, and we also ended up with a linear kernel which made the training step very light — likely due to the low rank of the movie features.

I evaluated the models in two ways: the first I’ll call a “discriminant classifier” — this will list the most useful sources of information (KL divergence) for a given music source, and the second is a “ranked classifier” — given popularity features, what would give the least surprise for the classifier. There’s good reasons for the two methods: the former is more statistically correct, but ignores that most people have never heard of most things, while the latter gives us safe bets that give less explicit information.1 As we see every day with music, a computer’s idea of “information” rarely has little to do with things like the success of “Fast & Furious 6.”

For example, I am able to ask it both: “If an average person likes Jay-Z, what are their favorite movies” (ranked) and “Which movie can I assume predicts the liking of Jay-Z”? (discriminant). They are:

RankedDiscriminant
Toy Story
Step Brothers
Buddy The Elf
Harry Potter (series)
Jackass
Superbad
Fight Club
Get Rich or Die Tryin’
Paid in Full
Scary Movie 4
Shottas
Juice
New Jack City
Friday After Next

Movie predictions for fans of Jay-Z

You can see the difference: the left side is the safe bets (everyone likes Toy Story! everyone likes Jay-Z!) and the right side is the less known but more useful results. So you don’t think I’m pulling a Shawshankr2 on you, here’s the list for a different artist:

RankedDiscriminant
Dirty Dancing
Toy Story
The Blind Side
Twilight (series)
The Notebook
Finding Nemo
Dear John
Pure Country
8 Seconds
Country Strong
Valentine’s Day
Sweet Home Alabama
Letters to Juliet
The Vow

Movie predictions for fans of Tim McGraw

We can also bulk this up by features of the movie, here are the top musical artists correlated with movies with a heavy crime element:

RankedDiscriminant
Jimi Hendrix
The Beatles
The Rolling Stones
Jay-Z
The Who
Bob Dylan
Pink Floyd
Ghostpoet
Amazing Blondel
Ian Anderson
Doseone
Young Gunz
Mandrill
Pato Banton

Artist predictions for fans of crime movies

Seeing the Amazing Blondel there just amazes me: we track two and a half million artists and it’s those guys that like crime movies? The data can’t lie.

The Amazing Blondel

We also looked up which movies our term computations considered “pornographic” or “adult” (they know it when they see it:)

RankedDiscriminant
Linkin Park
The Beatles
The Rolling Stones
Deftones
Limp Bizkit
Korn
Rage Against the Machine
The Receiving End of Sirens
Haste the Day
The Dillinger Escape Plan
The Mars Volta
Far * East Movement
Rediscover
Imogen Heap

Artist predictions for fans of adult movies

Fans of “Christian metalcore”-rers Haste the Day and Imogen Heap, we’re onto you. We don’t judge.

Overall

We did a lot more analysis, more of which you can see over on The Echo Nest’s new Musical Identity site, including breakdowns of different genres of films:

Sci-fi vs. Fantasy
Sci-fi vs. Fantasy

The goal of all of this is to understand more about music and the identity we express through our affinity. We’re getting closer with a lot of these large scale analyses of different forms of media and demographic and psychographic predictions from solely preference. But it’s also going to help us elsewhere: being able to recommend you that one song or artist with not much information is what we do, and the more we can predict from what we used to think of as orthogonal sources, the better.


  1. For the scientists getting mad: the ranked classifier applies a smoothed weight of terms by their document frequency — the number of times we saw a movie being mentioned. 

  2. The more precise movie recommender with the worst recall