<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>My name is Brian Whitman. I am a lapsed scientist and sound artist currently co-founder &amp; CTO of The Echo Nest, a music data company in Somerville, MA that powers most music experiences you have on the internet today. I mostly post about large text &amp; data, recommendation technologies and music tech. I’d always like to hear from you if you are working on similar things. 
A more formal bio for events &amp; press. </description><title>Brian Whitman @ variogr.am</title><generator>Tumblr (3.0; @grackle)</generator><link>http://notes.variogr.am/</link><item><title>‘Hungry, So Angry’ by Medium MediumThe TV thing’s blue,...</title><description>&lt;img src="http://25.media.tumblr.com/4b1758d096a9952bb7a1af34fc8a38c8/tumblr_mhe9gz5PcZ1qz4g66o1_400.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;b&gt;&lt;a href="http://www.thisismyjam.com/bwhitman/_4m1bncx?utm_source=tumblr&amp;utm_medium=sharing&amp;utm_campaign=user"&gt;‘Hungry, So Angry’ by Medium Medium&lt;/a&gt;&lt;/b&gt;&lt;br/&gt;The TV thing’s blue, it’s been finished for hours&lt;/p&gt;</description><link>http://notes.variogr.am/post/41787934818</link><guid>http://notes.variogr.am/post/41787934818</guid><pubDate>Tue, 29 Jan 2013 10:53:23 -0500</pubDate><category>thisismyjam</category></item><item><title>‘I Was Made To Love Her’ by Stevie WonderLike a sweet magnolia...</title><description>&lt;img src="http://24.media.tumblr.com/9c746894d64ec72c69c8db83074d7695/tumblr_mh77s52R251qz4g66o1_400.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;b&gt;&lt;a href="http://www.thisismyjam.com/bwhitman/_4kmqhp6?utm_source=tumblr&amp;utm_medium=sharing&amp;utm_campaign=user"&gt;‘I Was Made To Love Her’ by Stevie Wonder&lt;/a&gt;&lt;/b&gt;&lt;br/&gt;Like a sweet magnolia tree&lt;/p&gt;</description><link>http://notes.variogr.am/post/41460765752</link><guid>http://notes.variogr.am/post/41460765752</guid><pubDate>Fri, 25 Jan 2013 15:33:41 -0500</pubDate><category>thisismyjam</category></item><item><title>linesdotscircles:

seventy + eight</title><description>&lt;img src="http://25.media.tumblr.com/14dc61dbf31e7b677c8bc806912891dd/tumblr_meqddn6vmE1re1eg2o1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a href="http://linesdotscircles.tumblr.com/post/37500269634/seventy-eight" class="tumblr_blog"&gt;linesdotscircles&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;seventy + eight&lt;/p&gt;&lt;/blockquote&gt;</description><link>http://notes.variogr.am/post/41362800700</link><guid>http://notes.variogr.am/post/41362800700</guid><pubDate>Thu, 24 Jan 2013 10:45:11 -0500</pubDate></item><item><title>backjammon:

bwhitman’s Kurt  Weill &amp; Ira Gershwin jam...</title><description>&lt;img src="http://24.media.tumblr.com/3cf26d08b4df446df2d402d6a739c13d/tumblr_mgob4nVSsV1s3wodeo1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a href="http://backjammon.tumblr.com/post/40603457453/bwhitmans-kurt-weill-ira-gershwin-jam" class="tumblr_blog"&gt;backjammon&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;bwhitman’s &lt;span&gt;Kurt  Weill &amp; Ira Gershwin jam gets &lt;/span&gt;&lt;span&gt;Zeng Fanzhi’s “This Land so Rich in Beauty No. 2” [2010] as its background. Special.&lt;/span&gt;&lt;/p&gt;&lt;/blockquote&gt;</description><link>http://notes.variogr.am/post/40609321056</link><guid>http://notes.variogr.am/post/40609321056</guid><pubDate>Tue, 15 Jan 2013 12:28:52 -0500</pubDate></item><item><title>3G Wireless in Ubuntu on your ARM Chromebook</title><description>&lt;p&gt;If you have an &lt;a href="http://www.chromium.org/chromium-os/developer-information-for-chrome-os-devices/samsung-arm-chromebook"&gt;ARM Chromebook&lt;/a&gt; with 3G access and want to put Ubuntu on it and also keep your 3G access, do the following:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;If you haven&amp;#8217;t already, set up 3G in ChromeOS and make sure it works.
&lt;/li&gt;&lt;li&gt;&lt;a href="http://chromeos-cr48.blogspot.com/2012/12/so-you-want-chrubuntu-on-external-drive.html"&gt;Install Ubuntu on an external drive or SD card&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;sudo apt-get install wvdial
&lt;/li&gt;&lt;li&gt;Put this in your /etc/wvdial.conf after replacing the XXXs with your phone number that Verizon set up for you. You should have an email from them when you activated their free 100MB/month plan with the number in it.

&lt;script src="https://gist.github.com/4420677.js"&gt;&lt;/script&gt;&lt;/li&gt;&lt;li&gt;Then run sudo wvdial. 
&lt;/li&gt;&lt;li&gt;You may have it do it a couple of times before you see pppd do stuff. You can exit out of wvdial, pppd stays running until you kill it. 
&lt;/li&gt;&lt;li&gt;I so far cannot get the network configuration GUI to play nice with this, but I&amp;#8217;ll let you know if I do.
&lt;/li&gt;&lt;/ul&gt;</description><link>http://notes.variogr.am/post/39308314835</link><guid>http://notes.variogr.am/post/39308314835</guid><pubDate>Mon, 31 Dec 2012 10:29:58 -0500</pubDate><category>chromebook</category><category>arm</category><category>3g</category><category>ubuntu</category><category>chrubuntu</category></item><item><title>How music recommendation works -- and doesn't work</title><description>&lt;p&gt;When you see an automated music recommendation do you assume that some stupid computer program was trying to trick you into something? It’s often what it feels like – with what little context you get with a suggestion on top of the postmodern insanity of a &lt;em&gt;computer understanding how should you feel about music&lt;/em&gt; – and of course sometimes you actually are being tricked. &lt;/p&gt;

&lt;img src="http://img.skitch.com/20120409-xebncwth225qqiwy8f1ntif77.jpg" alt="Amazons recommendations for Abbey Road"/&gt;&lt;p&gt;&lt;em&gt;Amazon’s recommendations for Abbey Road&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No one is just learning that if they like a Beatles album, they may also like five others. Amazon is not optimizing for the noble work of raising independent artists’ profiles to the public, and they’re definitely not optimizing for a good musical experience. They’re statistically optimizing to &lt;em&gt;make more money&lt;/em&gt;, to sell you more things. Luckily this is the fruit fly of music recommendation, the late night infomercial quality of a music discovery experience that also might dry your lettuce if you spin it fast enough. And I doubt Amazon would ever claim otherwise. &lt;/p&gt;

&lt;p&gt;The rest of the field has gotten pretty far since then and we’ve now got tons of ways to discover music using actual qualities of the music or social cues of what your friends are listening to. But I still hate seeing examples like the above. I hate thinking there are forces at work in music discovery that don’t have listeners’ best interests at heart and I want to make them better. I want to walk through all the ways music recommenders work or don’t, and concentrate of course on the one I know best – The Echo Nest’s – which you’re probably using even if you don’t already know it. And most importantly, I want to talk about what we can do next.&lt;/p&gt;

&lt;p&gt;Before I get into it, a brief history of who I am: I’ve been working on music recommenders and music retrieval since 1999, &lt;a href="http://alumni.media.mit.edu/~bwhitman"&gt;academically&lt;/a&gt; and in industry. In 2005 I started The Echo Nest with my co-founder Tristan. We power most online music services’ discovery using a very interesting series of algorithms that is sort of the Voltron-figure of our two dissertations and the hard work of our 50 employees in Boston, SF, NYC and London. And we’ve been on a bit of a tear – just in the past year alone we’ve announced that we’re powering music discovery features for eMusic, Twitter, EMI, iHeartRadio, Rdio, Spotify, VEVO and Nokia – with some new heavy hitters yet announced – to add to our existing customer base that includes MTV, MOG, and the BBC. And through our &lt;a href="http://developer.echonest.com"&gt;API&lt;/a&gt; we have tens of thousands of developers making independent apps like Discovr, KCRW, Muzine, Raditaz, Swarm, SpotON and hundreds more. &lt;/p&gt;

&lt;p&gt;We’ve been a quiet company for a while and with all this great news comes a lot of new confusion about what we do and how it compares to other technologies. Journalists like to pin us as the “machine” approach to understanding music next to the “human” of our nearest corollary (not competitor) in the space – Pandora. This is somewhat unfair and belies the complexity of the problem. Yes, we use computer programs to help manage the mountains of music data, but so does everyone, and the way we get and use that data is just as human as anything else out there. &lt;/p&gt;

&lt;p&gt;I’ll go into technologies like collaborative filtering, automatic content based recommendation, and manual approaches used by Pandora or All Music Guide (Rovi). I’ll show that no matter what the computational approach ends up being, the source data – &lt;em&gt;how it knows about music&lt;/em&gt; – is the most important asset in creating a reliable useful music discovery service.&lt;/p&gt;

&lt;h2&gt;What is recommendation? What is it good for?&lt;/h2&gt;

&lt;p&gt;Musicians are competing for an audience among millions of others trying just as hard. And it’s not the listener’s fault if they miss out on something that will change their lives – these days, anyone can gain access to a library of over 15 million songs on demand for free. To a musician turned computer scientist (as I and so many of my colleagues are) this is the ultimate hidden variable problem. If there was something “intelligent” that could predict a song or artist to a person, both sides (musician and listener) win, music is amazing, there’s a ton of data, and it’s very far from solved.&lt;/p&gt;

&lt;p&gt;But anyone in the entire field of music technology has to treat music discovery with respect: it’s not about the revenue of the content owner, it’s not about the technology, it’s not about click through rates, listening hours or conversion. The past few years have shown us over and over that filters and guides are invaluable for music itself to coexist with the new ways of getting at it. We track over 2 million artists now – I estimate there are truly 50 million, most of them currently active. Every single one of them deserves a chance to get their art heard. And while we can laugh when Amazon suggests you put a Norah Jones CD in your cart after you buy a leaf blower, the millions of people that idly put on Pandora at work and get excited about a new band they’ve never heard deserve a careful look. Recommendation technology is powering the new radio and we have a chance to make it valuable for more than just the top 5 percent of musicians.&lt;/p&gt;

&lt;p&gt;When people talk about “music recommendation” or “music discovery” they usually mean one of a few things:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Artist or song similarity&lt;/strong&gt;: an anonymous list of similar items to your query. You can see this on almost any music service. Without any context, this is just a suggestion of what other artists or songs are similar to the one you are looking at. Formally, this is not truly a recommendation as there is no user model involved (although since a query took the user to the list, I still call these a recommendation. It’s a recommendation in the sense that a web search result is.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Personalized recommendation&lt;/strong&gt;: Given a “user model” (your activity on a service – plays, skips, ratings, purchases) a list of songs or artists that the service does not think you know about yet that fits your profile.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Playlist generation&lt;/strong&gt;: Most consumers of music discovery are using some form of playlist generation. This is different from the above two in that they receive a list of items in some order (usually meant to be listened to at the time.) The playlist can be personalized (from a user model) or not, and it can be within catalog (your own music, ala iTunes Genius’s or Google’s Instant Mix playlists) or not (Pandora, Spotify’s or Rdio’s radio, iHeartRadio.) The playlist should vary artists and types of songs as it progresses, and many rely on some form of steering or feedback (thumbs up, skips, etc.)&lt;/li&gt;
&lt;/ul&gt;&lt;table&gt;&lt;caption id="wherepopularservicessitindiscovery"&gt;Where popular services sit in discovery&lt;/caption&gt;
&lt;colgroup&gt;&lt;col style="text-align:left;"&gt;&lt;col style="text-align:left;"&gt;&lt;col style="text-align:left;"&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th style="text-align:left;"&gt;&lt;/th&gt;
	&lt;th style="text-align:left;"&gt;Personalized&lt;/th&gt;
	&lt;th style="text-align:left;"&gt;Anonymous&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;&lt;strong&gt;Playlist&lt;/strong&gt;&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Pandora&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Rdio radio&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;&lt;strong&gt;Suggestions&lt;/strong&gt;&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Amazon&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;All Music Guide&lt;/td&gt;
&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;These are three very different ways of doing music discovery, but for every technology and approach I know of, they are simply applications on top of the core data presented in different ways. For example, at the Echo Nest we do quite a bit to make our playlists “radio-like” using our observed statistics, acoustic features and a lot of QA but that sort of work is outside the scope of this article – all three of our similarity API, taste profiles (personalized recommendation) API and playlist API start with the same knowledge base culled from acoustic and text analysis of music.&lt;/p&gt;

&lt;p&gt;However, the application means a lot to the listener. People seem to love playlists and radio-style experiences, even if the data driving both that and the boring list of songs to check out are the same. One of the great things about working at the Echo Nest is seeing the amazing user interfaces and experiences and people put on top of our data. Listeners want to hear music, and they want to trust the service and have fun doing it. And conversely, a Pandora completely powered by Echo Nest data would feel the same to users but would have far better scale and results and thus add to the experience. Because of this very welcome sharding of discovery applications, it’s helpful less to talk about these applications directly and more to talk about “what the services know” about music – how they got to the result that Kreayshawn and Uffie are similar, no matter where it appeared in the radio station or suggestion or what user model led them there. We can leave the application and experience layer to another lengthy blog post.&lt;/p&gt;

&lt;p&gt;My (highly educated, but please know I have no direct inside information except for Echo Nest of course) guesses on the data sources are:&lt;/p&gt;

&lt;table&gt;&lt;caption id="howpopularservicesknowaboutmusic"&gt;How popular services know about music&lt;/caption&gt;
&lt;colgroup&gt;&lt;col style="text-align:left;"&gt;&lt;col style="text-align:left;"&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th style="text-align:left;"&gt;Service&lt;/th&gt;
	&lt;th style="text-align:left;"&gt;Source of data&lt;/th&gt;
&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;Pandora&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Musicologists take surveys&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;Songza&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Editors or music fans make playlists&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;Last.fm&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Activity data, tags on artists and songs, acoustic analysis&lt;a href="#fn:1" id="fnref:1" title="see footnote" class="footnote"&gt;[1]&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;All music guide&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Music editors &amp;amp; writers&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;Amazon&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Purchase &amp;amp; browsing history&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;iTunes Genius&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Purchase data, activity data from iTunes&lt;a href="#fn:2" id="fnref:2" title="see footnote" class="footnote"&gt;[2]&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align:left;"&gt;Echo Nest&lt;/td&gt;
	&lt;td style="text-align:left;"&gt;Acoustic analysis, text analysis&lt;/td&gt;
&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;There are many other discovery platforms but this list covers the widest swath of approaches. Many services you interact with use either these platforms directly (Last.fm, Echo Nest, AMG all license data or give away data through APIs) or use similar enough approaches that it’d be not worth going into in detail.&lt;/p&gt;

&lt;p&gt;From this list we’re left with a few major music knowledge approaches: (1) &lt;strong&gt;activity data&lt;/strong&gt;, (2) &lt;strong&gt;critical or editorial review&lt;/strong&gt;, (3) &lt;strong&gt;acoustic analysis&lt;/strong&gt;, and (4) &lt;strong&gt;text analysis&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;The former two are self-describing: you can learn about music by the activity around it (listens, plays, purchases) – Kreayshawn and Uffie are considered similar if the same people buy their singles or rate them highly – or you can learn about music by critical review, what humans have been pretty good at for some time. Of course, encoding activity (via collaborative filtering or taste mining) and critical review (via surveys or direct entry) into a database is a relatively recent art. &lt;/p&gt;

&lt;p&gt;The latter two, acoustic and textual analysis, were developed by the field as a reaction to the failures of the first two. I’ll go into much greater detail on those as it’s how Echo Nest does its magic.&lt;/p&gt;

&lt;h2&gt;Care and Scale&lt;/h2&gt;

&lt;p&gt;The dominating principle of the Echo Nest discovery approach from day one has been “care and scale.” When Tristan and I started the company in 2005 we were two guys with fresh PhDs on music analysis and some pretty good technological solutions; Tristan’s in the acoustic analysis realm (a computer taking a signal and making sense of it) and mine in the data mining and language analysis space (understanding what people are saying and doing with music.) We surveyed the landscape at the time for discovery and found that almost every one suffered from either a lack of care or scale, sometimes (and often) both. The entire impetus of doing a startup (not an easy choice for two scientists and anyone that has met us knows we are not the “startup type”) was that we thought we had something between the two of us that could fix those two problems.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://static.echonest.com/b/carescale.007.png" width="500px"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Care &amp;amp; Scale&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Scale is easy to explain: you have to know about as much music as possible to make good recommendations. If you don’t know about an up and coming artist, you can’t recommend them. If you only analyze or rate or understand the popular stuff you by default fail at discovery. Manual discovery approaches by their nature do not scale. We track over two million artists and over 30 million songs and there is no way a manually curated database can reach that level of knowledge. Even websites that can be volunteer or community edited run against the limits of the community that takes part – we count only a little over 130,000 artist pages on Wikipedia. Pandora recently crossed the 1 million song barrier, and it took them 10 years to get there. Try any hot new artist in Pandora and you’ll get the dreaded:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://aps.s3.amazonaws.com/j7zOw.png" width="300px"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Pandora not knowing about YUS&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is Pandora showing its lack of scale. They won’t have any information for &lt;a href="http://soundcloud.com/yusyusyus/nowadays"&gt;YUS&lt;/a&gt; for some time and may never unless the artist sells well. This is bad news and should make you angry: why would you let a third party act as a filter on top of your very personal experiences with music? Why would you ever use something that “hid” things from you?&lt;/p&gt;

&lt;p&gt;Activity data approaches (such as Last.fm and Amazon and iTunes Genius) also suffer from a slighter scale problem that manifests itself in a different way. It’s trivial to load a database of music into an activity data-based discovery engine (such as collaborative filtering or social tags.) I’ve often gone after such naive approaches to music discovery publicly. If a website or store has a list of user data (user A bought / listened to song Y at time Z) any bright engineer will immediately go into optimization mode. There’s almost a duplicitous ease of recommending music to people poorly. I recently was shopping for a specific type of transistor for a project on a parts and components website and found they, too, had turned on the SQL join that allowed “recommendations” on their site based on activity data:&lt;/p&gt;

&lt;img src="http://skitch-img.s3.amazonaws.com/20120521-d816a35w33w9t6416fcf2cquer.jpg" alt="Pathological filtering"/&gt;&lt;p&gt;&lt;em&gt;Pathological filtering&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Other than activity not making much sense in a discovery context, by default these systems suffer a “popularity bias,” where a lot of music simply doesn’t have enough activity data yet collected to be considered a recommendation match. Activity based systems can only know what people have told them explicitly, and this often makes it hard for less-popular artists to be recommended. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Care&lt;/em&gt; is a trickier concept and one we’ve tried very hard to define and encode into our engineering and product. I translate it as &lt;em&gt;“is this useful for the musician or listener?”&lt;/em&gt;A great litmus test for care in music discovery is to check the similar artists or songs to The Beatles. Is it just the members of the Beatles and their side projects? For almost all services that use musical activity data, it will be:&lt;/p&gt;

&lt;img src="http://img.skitch.com/20120502-q55eb3f4ig25249ashycg8ifne.jpg" alt="Top artist similars are all members of the Beatles"/&gt;&lt;p&gt;&lt;em&gt;Top artist similars are all members of the Beatles&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Certainly a statistically correct result&lt;a href="#fn:3" id="fnref:3" title="see footnote" class="footnote"&gt;[3]&lt;/a&gt;, but not a musically informative one. There is so much that user data can tell you about listening habits, but blindly using it to inform discovery belies a lack of care about the final result. Care is neatly handled by using social, manual or editorial approaches, as humans are pretty good at treating music properly. But when using more statistical or signal processing approaches that know about more music at scale, care has to be factored in somehow. Most purely signal processing approaches (such as Mufin here) fall down as badly on care as activity data approaches do:&lt;/p&gt;

&lt;img src="http://img.skitch.com/20120502-xr7y8nuneiq1ck41hfmkq4yffn.jpg" alt="Mufin expressing so little care about Stairway to Heaven"/&gt;&lt;p&gt;&lt;em&gt;Mufin expressing so little care about Stairway to Heaven&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Care is a layer of quality assurance, editing and sanity checks, real-world usage and analysis and, well, care, on top of any systematic results. You have to be able to stand by your results and fix them if they aren’t useful to either musicians or listeners. &lt;a href="http://musicmachinery.com/2011/05/14/how-good-is-googles-instant-mix/"&gt;Your WTF count has to be as low as possible.&lt;/a&gt; We’ve spent a lot of time embedding care into our process and while we’re always still working, we’re generally pleased with how our results look.&lt;/p&gt;

&lt;img src="http://img.skitch.com/20120502-pw8tutjw7akdm3fhup7c92qfr8.jpg" alt="Echo Nests Beatles similars via MTV Music Meter"/&gt;&lt;p&gt;&lt;em&gt;Echo Nest’s Beatles similars via MTV Music Meter&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without both care and scale you’ve got a system that a listener can’t trust and that musician can’t use to find new fans. You’ve failed both of your intended audiences and you might as well not try at all.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://static.echonest.com/b/carescale.008.png" width="500px"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Care &amp;amp; Scale of common approaches&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;Text Analysis&lt;/h2&gt;

&lt;img src="http://img.skitch.com/20111219-f8xs4bir6rnur17u8prqqm9rxx.jpg" alt="Echo Nest Cultural vectors"/&gt;&lt;p&gt;&lt;em&gt;Echo Nest Cultural vectors&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I started doing music analysis work in 1999 at the NEC Research Institute in Princeton, NJ (I had scammed them into an internship and then a full time job by being very persistent.) NEC was then full of the top tier of data mining, text retrieval, machine learning and natural language processing (NLP&lt;a href="#fn:4" id="fnref:4" title="see footnote" class="footnote"&gt;[4]&lt;/a&gt;) scientists; I had the great fortune to work with guys like &lt;a href="http://en.wikipedia.org/wiki/Steve_Lawrence_(computer_scientist)"&gt;Steve Lawrence&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Gary_William_Flake"&gt;Gary Flake&lt;/a&gt;, &lt;a href="http://www.nytimes.com/2012/03/24/science/david-l-waltz-computer-science-pioneer-dies-at-68.html"&gt;David Waltz&lt;/a&gt; and even &lt;a href="http://en.wikipedia.org/wiki/Vladimir_Vapnik"&gt;Vladimir Vapnik&lt;/a&gt; moved into my tiny office after I left for MIT. &lt;/p&gt;

&lt;p&gt;I was there while figuring out what to do with myself after abruptly quitting my PhD program in NLP at Columbia. I was a musician at the time, playing a lot of shows at various warehouse spaces or the lamented late “Brownies,” places where 20 people might show up and 10 would know who you were. There was a lot of excitement about “the future of music” – far more than there is today, as somehow we felt that the right forces would win and quickly. I logged onto Napster for the first time from a DSL connection and practically squealed in delight as a song could be downloaded faster than the time it would take to listen to it. It was a turning point for music access, but probably a step back for music &lt;em&gt;discovery&lt;/em&gt;. We were still stuck with this:&lt;/p&gt;

&lt;img src="http://img.skitch.com/20120513-mw83pjsw9s8eqrx72spriampg6.jpg" alt="Napster in 2001"/&gt;&lt;p&gt;&lt;em&gt;Napster in 2001&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The search was abysmal: a substring match on ID3v1 tags (&lt;a href="http://notes.variogr.am/post/225922016/armed-forces-in-alphabetical-order-archive"&gt;32 characters each for artist, title, release and a single byte for genre&lt;/a&gt;) or filename (usually “C:\MUSIC\MYAWES~1\RAPSONG.MP3”) and there was no discovery beyond clicking on other users’ names and seeing what they had on their hard drives. I would make my music available but of course, no one would ever download it because there was no way for them to find it. A fellow musician friend quickly took to falsely renaming his songs as “remixes” by better known versions of himself: “ARTIST - TITLE - APHEX TWIN MIX” and reported immediate success.&lt;/p&gt;

&lt;p&gt;At the time I was a member of various music mailing lists, USENET groups and frequent visitor of a new thing called “weblogs” and music news and review sites. I would read these voraciously and try to find stuff based on what people were talking about. To me, while listening to music is intensely private (almost always with headphones alone somewhere), the discovery of it is necessarily social. I figured there must be a way to take advantage of all of this conversation, all the excited people talking about music in the hopes that others can share in their discovery – and automate the process. Could a computer ‘read’ all that was going on across the internet? If just one person wrote about my music on some obscure corner of the web, then the system could know that too.&lt;/p&gt;

&lt;p&gt;This is &lt;em&gt;scale with care&lt;/em&gt;: real people feeding information into a large automated system from all different sources, without having to fill out a survey or edit a wiki page or join a social network. &lt;/p&gt;

&lt;p&gt;After almost ten years of data mining, language and music research (first at NECI, then a PhD at MIT at the Media Lab) The Echo Nest currently is the only music understanding service that takes this approach. And it works. We crawl the web constantly, scanning over 10 million music related pages a day. We throw away spam and non-music related content through filtering, we try to quickly find &lt;a href="http://notes.variogr.am/post/6687194793/the-echo-nest-puddle-and-artist-entity-extraction"&gt;artist names in large amounts of text&lt;/a&gt; and parse the language around the name. Every word anyone utters on the internet about music goes through our systems that look for descriptive terms, noun phrases and other text and those terms bucket up into what we call “cultural vectors” or “top terms.” Each artist and song has thousands of daily changing top terms. Each term has a weight associated, which tells us how important the description is (roughly, the probability that someone will describe music as that term.) We don’t use a fixed dictionary for this, we are able to understand new music terms as quickly as they are uttered, and our system works in many Latin-derived languages across many cultures.&lt;/p&gt;

&lt;p&gt;On top of this statistical NLP, we also pull in structured data from a number of partners and community access sites like Wikipedia or Musicbrainz. We apply the same frequency and vector approach to this knowledge-base style data: if Wikipedia lists the location of an artist as NYC and the label partner as New York, NY and their Facebook page has “EVERYWHERE ON TOUR 2012”, we have to figure out which is the right answer to index. Often the cultural vectors on structured data become a synthesis of all the different data sources.&lt;/p&gt;

&lt;p&gt;When a query for a similar artist or a playlist comes into our system, we take the source artist or song, grab its cultural vectors, and use those in real time to find the closest match. This is not easy to do at scale, and over the years we’ve done quite a lot of “big data” work to make this tractable. We don’t cache this data because it changes so often – the global conversation around music is very finicky and artists make overnight changes to their sound. &lt;/p&gt;

&lt;p&gt;A lot of useful data naturally falls out of cultural analysis of music: the quantity of conversation is used to inform our “hotttnesss” and familiarity data points, representing how popular the artist is now on the internet and overall how well known they might be. We can use the crawled text anonymously as sort of a proxy for listener data without having to get it from a playback service. And the index of documents that relate to artists or songs is of course valuable to a lot of our customers in a feed or search context – showing news or reviews about artists their users are interested in.&lt;/p&gt;

&lt;h2&gt;Acoustic Analysis&lt;/h2&gt;

&lt;img src="http://img.skitch.com/20111219-k7bxrqasx59ymxcwiusngmshme.jpg" alt="Echo Nest acoustic analysis view"/&gt;&lt;p&gt;&lt;em&gt;Echo Nest acoustic analysis view&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The internet is not the &lt;a href="https://gist.github.com/3185350"&gt;Library of Babel&lt;/a&gt; we envision it to be, and quite often many “lower-rank” (less popular) musicians are left out of the “cultural universe” we crawl. Also, the description of music necessarily leaves out things that &lt;em&gt;actually describe the music&lt;/em&gt; – a &lt;a href="http://www.google.com/search?num=100&amp;amp;hl=en&amp;amp;safe=off&amp;amp;client=safari&amp;amp;rls=en&amp;amp;biw=1186&amp;amp;bih=735&amp;amp;tbm=blg&amp;amp;q=rihanna&amp;amp;oq=rihanna&amp;amp;aq=f&amp;amp;aqi=g10&amp;amp;aql=&amp;amp;gs_l=serp.3..0l10.664.1762.0.1855.7.3.0.4.4.0.90.237.3.3.0...0.0.62-XpJLoiLI"&gt;Google blog search for Rihanna&lt;/a&gt; illustrates the problem well: many popular artists’ descriptions are skewed towards the celebrity angle and while this is certainly a valid thing to know about a musician, it’s not all we need to know. Lastly, internet discussion of music tends to concentrate on &lt;em&gt;artists&lt;/em&gt;, not &lt;em&gt;songs&lt;/em&gt; (although there is sometimes talk of individual songs on music blogs.) These three issues (and common sense) require us figure out if we can understand how a song &lt;em&gt;sounds&lt;/em&gt; as well as how the artist and song is represented by listeners. And if we are going to follow care and scale, we’ve got to do this automatically, with a computer doing the job of the careful listening.&lt;/p&gt;

&lt;p&gt;Can a computer &lt;em&gt;really listen&lt;/em&gt; to music? A lot of people have promised it can over the years, but I’ve (personally) never heard a fully automated recommendation based purely on acoustic analysis that made any sense – and I’ve heard them all, from academic papers to startups to our own technology to big-company efforts. And that has a lot to do with the expectations of the listener. There are certain things computers are very good and fast at doing with music, like determining the tempo or key, or how loud it is. Then there are harder things that will get better as the science evolves, like time signature detection, beat tracking over time, transcription of a dominant melody, and instrument recognition. But even if a computer were to predict all of these features accurately, does that information really translate into a good recommendation? Usually not – and we’ve shown over the years that people’s expectation of “similar” – either in a playlist or a list of artists or songs – trends heavily towards the cultural side, something that no computer can get at simply by analyzing a signal.&lt;/p&gt;

&lt;p&gt;But it does turn out that acoustic analysis has a huge part to play in our algorithms. People expect playlists to be smooth and not jump around too much. Quiet songs should not be followed with loud metal benders (unless the listener asked for that.) For jogging, the tempo should steadily increase. Most coherent mixes should keep the instrumentation generally stable. Songs should flow into one another like a DJ would program them, keeping tempo or key consistent. And there’s a ton we haven’t figured out yet on the interface side. Could a “super dorky query interface” work for music recommendation, where a listener can filter by dominant key or loudness dynamics? Maybe with the right user experience. An early product out of the Echo Nest&lt;a href="#fn:5" id="fnref:5" title="see footnote" class="footnote"&gt;[5]&lt;/a&gt; was an “intelligent pause button” that Tristan whipped up that would compose a repeating segment out of the part of the song you were in or just play the song roughly forever (&lt;a href="http://dl.dropbox.com/u/394242/mp3s/phoenix_10.mp3"&gt;check out an automated 10 minute MP3 re-edit of a Phoenix song&lt;/a&gt;) – which a few years later became Paul’s amazing &lt;a href="http://infinitejuke.com"&gt;Infinite Jukebox&lt;/a&gt; – these experiments are fascinating precursors to a new listening experience that might become more important than discovery itself.&lt;/p&gt;

&lt;img src="http://img.skitch.com/20120513-c96pa5e5g76r7xfnnp981mrey1.jpg" alt="How Echo Nest acoustic analysis works"/&gt;&lt;p&gt;&lt;em&gt;How Echo Nest acoustic analysis works&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://docs.echonest.com.s3-website-us-east-1.amazonaws.com/_static/AnalyzeDocumentation.pdf"&gt;The Echo Nest audio analysis engine&lt;/a&gt; (PDF) contains a suite of machine listening processes that can take any audio file and outputs both low-level (such as the time of when every beat starts) and high-level (such as the overall “danceability”) information for any song in the world. We analyze all the music we work with, and developers can upload their own audio to see everything we compute on a track via our API. Our analysis starts by pretending it was an ear: it will model the frequencies and loudness of a musical signal much the same way perceptual codecs like MP3 or AAC do. It then segments the audio into small pieces – roughly 200ms to 4s, depending on how fast things are happening in the song. For each segment we can tell you the pitch (in a 12-dimensional vector called &lt;em&gt;chroma&lt;/em&gt;), the loudness (in an ADSR-style envelope) and the timbre, which is another 12 dimensional vector that represents the sound of the sound – what instruments there are, how noisy it is, etc. It also tracks beats across the signal, in subdivisions of the musical meter called &lt;em&gt;tatums&lt;/em&gt;, and then per beat and bar, alongside larger song-level structure we call sections that denote choruses, intros, bridges and verses. &lt;/p&gt;

&lt;p&gt;That low level information can be combined through some useful applications of machine learning that Tristan and has team have built over the years to “understand” the song at a higher level. We emit song attributes such as &lt;em&gt;danceability&lt;/em&gt;, &lt;em&gt;energy&lt;/em&gt;, &lt;em&gt;key&lt;/em&gt;, &lt;em&gt;liveness&lt;/em&gt;, and &lt;em&gt;speechiness&lt;/em&gt;, which aim to represent the aboutness of the song in single floating point scalars. These attributes are either heuristically or statistically observed from large testbeds: we work with musicians to label large swaths of ground truth audio against which to test and evaluate our models. Our audio analysis can be seen as an automated lead sheet or a computationally understandable overview of the song: how fast it is, how loud it gets, what instruments are in it. The data within the analysis is so fine grained that you can use it as a &lt;a href="http://echonest.github.com/remix"&gt;remix tool&lt;/a&gt; – it can chop up songs by individual segments or beats and rearrange them without anyone noticing.&lt;/p&gt;

&lt;p&gt;We don’t use either type of data alone to do recommendations. We always filter the world of music through the cultural approaches I showed above and then use the acoustic information to order or sort the results by song. A great test of a music recommender is to see how it deals with heavy metal ballads – you normally would expect other ballads by heavy metal bands. This requires a combination of the acoustic and cultural analysis working in concert. The acoustic information is obviously also useful for playlist generation and ordering, or keeping the mood of a recommendation list coherent. &lt;/p&gt;

&lt;h2&gt;What’s next&lt;/h2&gt;

&lt;p&gt;I’ve used every single automated music recommendation platform, technology or service. It’s obviously part of my job and it’s been astounding to watch the field (both academically and commercially) mature and test new approaches. We’ve come a long way from &lt;a href="http://jolomo.net/ringo.html"&gt;RINGO&lt;/a&gt; and while the Echo Nest-style system is undoubtedly the top of the pack these days as far as raw quality of automated results go, there’s still quite a lot of room to grow. I’ve been noticing two trends in the space that will certainly heat up in the years to come:&lt;/p&gt;

&lt;h3&gt;Social – filtering collaborative filtering&lt;/h3&gt;

&lt;img src="http://aps.s3.amazonaws.com/OjxJv.png" alt="This is my jam"/&gt;&lt;p&gt;&lt;em&gt;This is my jam&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Social music discovery as embodied by some of my favorite music services such as &lt;a href="http://thisismyjam.com"&gt;This is my jam&lt;/a&gt;, &lt;a href="http://swarm.fm"&gt;Swarm.fm&lt;/a&gt;, Facebook’s music activity ticker and real time broadcasting services like the old Listening Room and its modern progentior Turntable.fm often have no or little automated music discovery. Friend-to-friend music recommendations enabled by social networks are extremely valuable in music discovery (and I personally rely on them quite often), but are not recommendation engines as they can not automatically predict anything when bereft of explicit social signals (and so they fail on scale.) The main amazing and useful feature of recommendation systems are that it can find things for you that you wouldn’t have otherwise come across.&lt;/p&gt;

&lt;p&gt;Even though that puts this kind of service outside the scope of this article, that doesn’t mean we should ignore the power of social recommendations. There’s something very obvious in the social fabric of these services that makes personal recommendations more valuable: &lt;em&gt;people don’t like computers telling them what to do.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;There are some interesting new services that mix both the social aspects of recommendation with automated measures. This is my jam’s “related jams” is a first useful crack at this, as is &lt;a href="http://www.spotify.com/us/blog/archives/2012/12/06/discover/"&gt;Spotify’s recent “Discover” feature&lt;/a&gt;. You can almost see these as extensions of your social graph: if your friends haven’t yet caught onto Frank Ocean there might be signals that show they will get there soon, and using cultural filtering can get us there. And a lot of the power of social recommendations – that it comes from your friends – can tell the story better than just a raw list of “artists we think you would like.”&lt;/p&gt;

&lt;h3&gt;Listener intelligence&lt;/h3&gt;

&lt;p&gt;When do you listen to music? Is it in the morning on your way to work? Is it on the weekends, relaxing at home? When you do it, how often do you listen to albums versus individual songs in a playlist? Do you idly turn on an automated radio station, or have your own playlists? Which services do you use and why? If it’s raining, do you find yourself putting on different music? The scariest thing about all the music recommendation systems I’ve gone over (including ours as of right now) is &lt;em&gt;none of them look at this necessary listener context.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;There is a lot going on in this space, both internal stuff we can’t announce yet at Echo Nest, and new products I’m seeing coming out of Spotify and Facebook. We’re throwing our weight behind the &lt;a href="http://developer.echonest.com/raw_tutorials/catalog_api/what.html"&gt;taste profile&lt;/a&gt; – the API that represents musical activity on our servers. It’s sort of the &lt;a href="http://en.wiktionary.org/wiki/scrobble"&gt;scrobble&lt;/a&gt; 2.0: both representing playback activity as well as all the necessary context around it: your behaviors and patterns, your collection, your usage across services and maybe even domains. &lt;a href="http://blog.echonest.com/post/33229165293/taste-profiles-go-public"&gt;We’re even publishing APIs to do bulk analysis of the activity&lt;/a&gt; to surface attributes like “mainstreamness” or “taste-freeze,” the average active year of your favorite artists. This is more than activity mining as collaborative filtering sees it: it’s understanding everything about the listener we can, well beyond just making a prediction of taste based on purchase or streaming activity. All of these attributes and analysis might be part of the final frontier of music recommendation: understanding enough to really understand the music and the listener it’s directed to.&lt;/p&gt;

&lt;p&gt;– Brian (brian@echonest.com)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thanks to EVB for the edits&lt;/em&gt;&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:1"&gt;
&lt;p&gt;I know they do a ton of acoustic analysis there but I don’t know if it’s used in radio or similar artists/songs. Probably. &lt;a href="#fnref:1" title="return to article" class="reversefootnote"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:2"&gt;
&lt;p&gt;They use Gracenote’s CDDB data as well but I don’t know to what extent it appears in Genius. &lt;a href="#fnref:2" title="return to article" class="reversefootnote"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:3"&gt;
&lt;p&gt;You can even claim that since you are listening to the Beatles, you are also listening to the individual members &lt;em&gt;at the same time&lt;/em&gt; but let’s not get too ahead of the curve here &lt;a href="#fnref:3" title="return to article" class="reversefootnote"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:4"&gt;
&lt;p&gt;Whenever I say NLP I mean the real NLP, the natural language processing NLP, not the creepy pseudoscience one.  &lt;a href="#fnref:4" title="return to article" class="reversefootnote"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:5"&gt;
&lt;p&gt;Released eventually as the &lt;a href="http://blog.echonest.com/post/597162554/earworm-and-capsule"&gt;Earworm&lt;/a&gt; example in &lt;a href="http://github.com/echonest/remix"&gt;Echo Nest Remix&lt;/a&gt;. &lt;a href="#fnref:5" title="return to article" class="reversefootnote"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://notes.variogr.am/post/37675885491</link><guid>http://notes.variogr.am/post/37675885491</guid><pubDate>Tue, 11 Dec 2012 11:56:00 -0500</pubDate><category>echonest</category><category>echo nest</category><category>music</category><category>recommendation</category><category>music recommendation</category></item><item><title>Music data talk at Velocity EU</title><description>&lt;p&gt;I gave a talk at Velocity EU in London a couple of weeks ago on how some pieces of the Echo Nest work, operationally:&lt;/p&gt;

&lt;script async class="speakerdeck-embed" data-id="507d8d11e7912c000205aa27" data-ratio="1.7777777777777777" src="//speakerdeck.com/assets/embed.js"&gt;&lt;/script&gt;&lt;p&gt;(&lt;a href="https://speakerdeck.com/bwhitman/music-data"&gt;Direct link on speakerdeck&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;We started Echo Nest seven years ago and I couldn&amp;#8217;t imagine a stranger time to begin a technology-focused venture. We started with 2 self-built 2U rack machines in a closet at MIT and within two years started moving everything to first dozens and then almost thousands of virtualized hosts running on a few different cloud providers. And then a year or two ago we pulled it all back to physical again. We&amp;#8217;re in a weird spot between offline data processor and real time API provider and none of the oft-repeated hype platforms ever worked for us.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;ve been around before and during much of Hadoop, Solr, &amp;#8220;NoSQL&amp;#8221;, EC2, API-as-PR, the rise of mobile, Apple on Intel, Hacker News and sharded mySQL as a key-value store. And we&amp;#8217;re still at it, making money in an industry that would rather keep it to itself. If it&amp;#8217;s not obvious, I&amp;#8217;m really proud of what we built and the small team that works their ass off to keep it working while making some really cool new stuff.&lt;/p&gt;

&lt;p&gt;There&amp;#8217;s a lot of stories to tell and this is just the first one, on how we&amp;#8217;ve abused text indexing to do quite a lot of our work. If you ever wondered how some of the musical data souffle avoids burning, here it is. Thanks to John &amp;amp; Kellan for giving me a chance to present this.&lt;/p&gt;</description><link>http://notes.variogr.am/post/34167535868</link><guid>http://notes.variogr.am/post/34167535868</guid><pubDate>Tue, 23 Oct 2012 11:39:00 -0400</pubDate></item><item><title>1980s pen plotters of the future</title><description>&lt;iframe width="560" height="315" src="http://www.youtube.com/embed/T20-KcCGokU?rel=0" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;

&lt;p&gt;Early 1980s pen plotters are amazing tools that are still very useful in today&amp;#8217;s world. There&amp;#8217;s something completely transfixing about a mechanical device moving an actual pen on paper versus the smelly black box of the laser printer. And if you&amp;#8217;re trying to draw lines or curves or like the effect of actual ink touching  paper (not sprayed on in microdots) there&amp;#8217;s no other way. Luckily, there&amp;#8217;s some great tools out there for making plotters work on modern hardware and using modern file formats (PDFs) and the hardware itself, while finicky and aging, is cheap.&lt;/p&gt;

&lt;h2&gt;Hardware&lt;/h2&gt;

&lt;p&gt;You&amp;#8217;re going to have to start with the right hardware. &lt;a href="http://music.columbia.edu/cmc/chiplotle/plotter_FAQs.shtml"&gt;The Chiplotle! plotter FAQ is great for this&lt;/a&gt;, I&amp;#8217;ve added my notes below:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;strong&gt;USB - serial interface&lt;/strong&gt; or a serial port on your computer. I have trouble with Chiplotle! with the very common Prolific serial adapters (it might be the drivers) but if you have something that works it&amp;#8217;ll probably be fine. I use &lt;a href="http://www.tripplite.com/en/products/model.cfm?txtModelID=3914"&gt;an ex-Keyspan one&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Any HPGL compatible pen plotter&lt;/strong&gt;. eBay is almost always your best bet, unless you live near the MIT Flea Market. Make sure the plotter supports HPGL via serial connection. If it says HPIB or GPIB, do not get. Very common eBay finds are the Roland DXYs or the HP 7475a. &lt;a href="http://music.columbia.edu/cmc/chiplotle/manual/chapters/api/plotters.html"&gt;If in doubt, check the Chiplotle! list of supported plotters&lt;/a&gt; (although keep in mind similar model #s will also work; for example the DXY-1150 and 1150a act as a DXY1300 in Chiplotle!.) I normally pay $50 or so for a single sheet plotter like the DXY-1150. Make sure you get a power supply with the plotter, that&amp;#8217;s often the hardest thing to find and none of them use anything standard.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A plotter serial cable&lt;/strong&gt;. The only place you&amp;#8217;ll need to pull out a soldering iron. You can buy plotter serial cables on eBay but it&amp;#8217;s easier to just make your own from a DB25 male to DB9 female cable. You have to &lt;a href="http://music.columbia.edu/cmc/chiplotle/plotter_cable.pdf"&gt;re-route a few wires&lt;/a&gt; but it&amp;#8217;s easy to do.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pens &amp;amp; paper&lt;/strong&gt;. Your eBay plotter likely will come with a plastic bag full of dried out pens or, if you&amp;#8217;re lucky, boxed ones. There&amp;#8217;s a huge variety of pen types (for different paper or thicknesses, felt vs. fine point) or you can fashion your own if you&amp;#8217;re wily. For paper, the average desktop plotter can take up to 11 x 17&amp;#8221; paper or just normal printer paper (make sure to set the DIP switches on the back if you don&amp;#8217;t use the full size.) I have a nice stack of artist vellum paper with nice vellum pens.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Have all that? Now, find a place to put the plotter, run the plotter cable from the serial port on back of the plotter to your USB adapter, and load up some paper. Depending on the plotter, the paper might be held in &lt;em&gt;electrostatically&lt;/em&gt; or it may be magnetized and expect a metal tab to hold in the paper (which you obviously did not get in the eBay shipment, you can use a metal ruler or something similar instead.) Or maybe you will have to tape the paper on. If it&amp;#8217;s a wide format plotter the paper will have to be in a roll and the vertical axis is done via the roller mechanism like a receipt printer (these are notriously fiddly, I would avoid them unless you really need 3 foot wide paper.) You then load up a pen (in the pen holder off to the side, not the plotter head &amp;#8212; the beginning of every HPGL file tells the plotter to pick up the pen from the holder.) Now, let&amp;#8217;s plot.&lt;/p&gt;

&lt;h2&gt;Software&lt;/h2&gt;

&lt;p&gt;In 2008 I was bitten by the plotter bug all of a sudden. I was trying to draw a smooth bezier curve robotically and was looking at various servo or motor solutions when I stumbled on the community of folks that have adapted Roland pen plotters into vinyl cutting CNC machines. I found myself intensely bidding on my first plotter against a familiar eBay username. After I lost, I confirmed my suspicions: I was in competition with my dear friend &lt;a href="http://music.columbia.edu/~douglas/"&gt;Douglas Repetto&lt;/a&gt; of CMC &amp;amp; dorkbot fame. And not only was he also independently plotter crazed, he was working on a Python module for HPGL control called, well, &lt;a href="http://chiplotle.org"&gt;Chiplotle!&lt;/a&gt;. Maybe there was something in the water that week.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://music.columbia.edu/cmc/chiplotle/chiplotle_logo_2.png" alt="Chiplotle!"/&gt;&lt;/p&gt;

&lt;p&gt;Chiplotle is obviously the best and only way to reliably control a plotter from a modern computer. It does quite a lot of work for you: it manages the commandset of your plotter, buffers output so it doesn&amp;#8217;t overflow and start drawing random straight lines, provides a interactive terminal where you can &amp;#8220;live draw&amp;#8221; and a bunch of other necessary stuff. Although you can import chiplotle into your program and programatically control your plotter, I tend to use just one commandset of Chiplotle &amp;#8212; the plot_hpgl_file script that it installs.&lt;/p&gt;

&lt;p&gt;HPGL files are like the wizened ancestor of PDF. It is simply an ASCII file of &lt;a href="http://www.isoplotec.co.jp/HPGL/eHPGL.htm"&gt;text commands to draw lines, curves, choose pens, etc&lt;/a&gt; and is the plotter&amp;#8217;s native language. If you want, you can ignore Chiplotle! altogether and just &lt;strong&gt;cat&lt;/strong&gt; an HPGL file to your serial port at 9600 baud, 8N1. This will work fine for the first few commands but eventually the plotter&amp;#8217;s internal buffer (mine is 512 bytes) will overflow. plot_hpgl_file takes care of all of this. The first time you run it, it will attempt to detect which plotter you have on which serial port. Then it will slowly spit out the HPGL commands and make sure the plotter is acknowledging them.&lt;/p&gt;

&lt;p&gt;My workflow for the project I am on now is to generate PDF files programatically using the &lt;a href="http://www.reportlab.com/software/opensource/"&gt;amazing ReportLab python PDF toolkit&lt;/a&gt; and then convert them to HPGL using &lt;a href="http://www.pstoedit.net/"&gt;pstoedit&lt;/a&gt; and plot it. It is as simple as&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;python your_pdf_generator.py file.pdf
pstoedit -f hpgl file.pdf &amp;gt; output.hpgl
plot_hpgl_file.py output.hpgl
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Obviously I could have my python program directly control the plotter, but as ink and paper add up, you will want a step in between to make sure your art is OK, and the PDF step is natively viewable on any platform. Since PDF and HPGL both share a lot of common ancestry, the curveTos, lineTos and moveTos are kept consistent with no loss of quality. There&amp;#8217;s no rasterizing step: if you generate curves programatically with ReportLab, they will be the same curve on the paper in the plotter.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://img.skitch.com/20120812-kuqixaqy85n1f9iqgim7qu6egb.jpg" alt="Owls"/&gt;&lt;/p&gt;</description><link>http://notes.variogr.am/post/29262500593</link><guid>http://notes.variogr.am/post/29262500593</guid><pubDate>Sun, 12 Aug 2012 09:38:34 -0400</pubDate></item><item><title>Musical introversion - an ode to This is My Jam</title><description>&lt;p&gt;&lt;img src="http://img.skitch.com/20120805-j3f8tfkygga2bj6dpudb883nh8.jpg" alt="This is my jam"/&gt;&lt;/p&gt;

&lt;p&gt;Without a doubt my favorite internet music service ever is &lt;a href="http://thisismyjam.com"&gt;This is my jam&lt;/a&gt; (aka TIMJ), a self styled &amp;#8220;slow music&amp;#8221; site where you get to choose one song only for your friends to hear. The song expires in seven days if you don&amp;#8217;t change it. That&amp;#8217;s it.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s been out for about six months and it&amp;#8217;s doing great; these guys do not need my publicity help and I doubt anyone reading this on my blog needs to hear about it again. And all four that built it are dear friends &amp;amp; there is a tenuous relationship between my company and theirs.&lt;sup id="fnref:p29053732064-1"&gt;&lt;a href="#fn:p29053732064-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt; You may think there&amp;#8217;s  a lot of bias here. But I am not the kind of person to blindly like something because I was involved, in fact, for me it usually means they have to work even harder to get over my inner critic. And even worse for them, I tend to hate &lt;em&gt;almost every music social network.&lt;/em&gt; I grew up with music as the most personal and private thing in my life. I normally refuse to answer the &amp;#8220;what music do you like&amp;#8221; question, and &lt;a href="https://twitter.com/bwhitman/statuses/117264202921545728"&gt;Spotify&amp;#8217;s recent &amp;#8220;social listening&amp;#8221; features really put me off the service&lt;/a&gt;. I also (naturally) hate most attempts of a computer trying to tell me what to listen to.&lt;/p&gt;

&lt;p&gt;But TIMJ has somehow won a place in my heart. I &amp;#8220;follow&amp;#8221; three times more people there than Twitter, and I spend roughly two hours on the site a day, listening to my friend&amp;#8217;s jams and sometimes updating mine. What TIMJ does so well I think comes down to three things:&lt;/p&gt;

&lt;h3&gt;They make you look good&lt;/h3&gt;

&lt;p&gt;This is My Jam looks great, and the hallmark of a great social experience is that it also makes you look good without much effort. The jam selection is very visual, almost forcing the choice of a &amp;#8220;cover image&amp;#8221; (square) and background (widescreen.)&lt;/p&gt;

&lt;p&gt;&lt;img src="http://img.skitch.com/20120807-dhishkabtbfytqnckx47k897t6.jpg" alt="Making even me look good"/&gt;&lt;/p&gt;

&lt;p&gt;I doubt I am not the only TIMJ member with this folder on their desktop:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://img.skitch.com/20120809-k4ru22j7m6tyiq496f6t762hbn.jpg" alt="My jam images folder"/&gt;&lt;/p&gt;

&lt;p&gt;Hannah (TIMJ&amp;#8217;s lead designer) deserves a huge shout-out for her amazing work on the site. I happened to be in London with them the day they began work on the concept in earnest, and the first thing she did was go in another room (where I happened to be peeling a mango) and draw three pages full of circles. I&amp;#8217;m a computer person forever, but I know when someone has put care and serious thought into something and you easily get the feeling that every pixel and click has been thought through.&lt;/p&gt;

&lt;h3&gt;They&amp;#8217;ve been the first to really do &amp;#8220;frictionless&amp;#8221; right&lt;/h3&gt;

&lt;p&gt;For many years myself and my remote brain The Echo Nest have been harping hard on the &lt;a href="http://notes.variogr.am/post/1373556723/the-future-music-platform-music-startups-imminent"&gt;music platform&lt;/a&gt; &amp;#8212; the idea that the provider of audio can be decoupled from the music experience itself. This was a pipe dream only a few years ago, but now we&amp;#8217;ve got Spotify apps, Soundcloud embeds and Rdio APIs. And TIMJ is truly the first site I&amp;#8217;ve seen that makes real use of these. When you want to create a new jam, you don&amp;#8217;t need to futz with local files on your computer or figuring out what service to use. The jam search seamlessly shows you two lists: one of just audio (you provide your own image) or one of video with audio. Obviously, backing that is APIs from Soundcloud, YouTube, Hype Machine, and Echo Nest but no one needs to know or care about that.&lt;/p&gt;

&lt;p&gt;And of course, as of today, TIMJ has a new &lt;a href="http://www.thisismyjam.com/apps/spotify"&gt;Spotify app&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://img.skitch.com/20120805-edgxc7ytx143cgrq37n2jj6agm.jpg" alt="This is my jam's Spotify app"/&gt;&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been playing with the Spotify app for a while because I had access to their GitHub repository&lt;sup id="fnref:p29053732064-2"&gt;&lt;a href="#fn:p29053732064-2" rel="footnote"&gt;2&lt;/a&gt;&lt;/sup&gt; and the playlists it creates of your likes (and friend&amp;#8217;s jams &amp;amp; likes) have become my main source of new music on Spotify. Team Jam made a great decision to use Spotify as the archive read-only mode of TIMJ, the place where you go when now is too much to deal with. In practice, Spotify seems to match a large majority of jams already in its database.&lt;/p&gt;

&lt;h3&gt;Slow and steady wins the race&lt;/h3&gt;

&lt;p&gt;There is nothing to stop you or your friends from posting new jams every three minutes, but there are quiet hints throughout the site to keep it slow, mostly ambient clues like the lack of a &amp;#8220;stream&amp;#8221; auto-updating view and the relatively many-click process of updating your jam, or the fact that you can&amp;#8217;t easily do it from mobile or SMS or other networks. I don&amp;#8217;t know the actual numbers but I&amp;#8217;m guessing the average &amp;#8220;jam length&amp;#8221; (the amount of time a users&amp;#8217; jam stays active) is about 2-4 days. This maps well to my musical brain. I can&amp;#8217;t stand Spotify/Facebook&amp;#8217;s incessant scroller of activity, music deserves a lot more care than that. And hinting the speed makes your friends care about what they post and only surfaces the best stuff.&lt;/p&gt;

&lt;p&gt;Another great (surely frustrating at first) hint is the focus on &amp;#8220;now&amp;#8221; versus the always tempting archival nostalgia past. Only the current jam per person is listen-able on the site, and only the past 5 are even shown.  All comments, likes and plays disappear when a new jam shows up. It&amp;#8217;s the opposite instinct that every other social experience tries to have and it works great for (the very temporal and fiddly) music.&lt;/p&gt;

&lt;hr&gt;&lt;p&gt;Congrats to Team Jam on the Spotify app release and for the stunning growth and  for making my favorite music site ever. And make sure to &lt;a href="http://thisismyjam.com/bwhitman"&gt;follow me on TIMJ&lt;/a&gt;, I have the best taste in music &lt;em&gt;ever&lt;/em&gt;.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p29053732064-1"&gt;
&lt;p&gt;Simply put, Echo Nest currently &amp;#8216;sponsors&amp;#8217; it but it&amp;#8217;s an independent entity; in exchange for the domain name I bought in 2007 for something totally different, they have to let me sit in their office in London sometime. &lt;a href="#fnref:p29053732064-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p29053732064-2"&gt;
&lt;p&gt;At some point it was rewritten to require some sort of  Makefile written in &lt;a href="http://brunch.io"&gt;Brunch&lt;/a&gt; and &lt;em&gt;every single node.js module ever&lt;/em&gt; &amp;#8212; for what amounts to a web page &amp;#8212; I truly am no longer made for these times &lt;a href="#fnref:p29053732064-2" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://notes.variogr.am/post/29053732064</link><guid>http://notes.variogr.am/post/29053732064</guid><pubDate>Thu, 09 Aug 2012 10:44:28 -0400</pubDate></item><item><title>Green &amp; blue pixels from CRTs</title><description>&lt;img src="http://24.media.tumblr.com/tumblr_m8b0khRb1k1qz4g66o1_500.jpg"/&gt;&lt;br/&gt; &lt;br/&gt;&lt;img src="http://24.media.tumblr.com/tumblr_m8b0khRb1k1qz4g66o2_500.jpg"/&gt;&lt;br/&gt; &lt;br/&gt;&lt;img src="http://24.media.tumblr.com/tumblr_m8b0khRb1k1qz4g66o3_500.jpg"/&gt;&lt;br/&gt; &lt;br/&gt;&lt;p&gt;Green &amp; blue pixels from CRTs&lt;/p&gt;</description><link>http://notes.variogr.am/post/28793391668</link><guid>http://notes.variogr.am/post/28793391668</guid><pubDate>Sun, 05 Aug 2012 18:45:00 -0400</pubDate></item><item><title>Owl study one, two three</title><description>&lt;img src="http://25.media.tumblr.com/tumblr_m8aj3bzGcm1qz4g66o1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;Owl study one, two three&lt;/p&gt;</description><link>http://notes.variogr.am/post/28770742569</link><guid>http://notes.variogr.am/post/28770742569</guid><pubDate>Sun, 05 Aug 2012 12:28:20 -0400</pubDate></item><item><title>I spent a good six hours building this Apple //c Information...</title><description>&lt;img src="http://25.media.tumblr.com/tumblr_m7qgzadcNp1qz4g66o1_400.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;I spent a good six hours building this Apple //c Information Kiosk. It’s hilarious! You can type in any band name (that existed 1984 or earlier OF COURSE) and it shows you a picture and their bio. Or in automatic mode just scrolls through a set of pop artists around that time (via &lt;a href="http://developer.echonest.com/api/v4/artist/search?api_key=N6E4NIOVYMTHNDM8J&amp;format=json&amp;results=100&amp;artist_start_year_before=1984&amp;sort=familiarity-desc&amp;artist_start_year_after=1980&amp;style=pop"&gt;this amazing Echo Nest API call&lt;/a&gt;. 

&lt;/p&gt;&lt;p&gt;You can &lt;a href="https://github.com/echonest/kiosk"&gt;run it yourself if you have the right hardware&lt;/a&gt;, here’s my &lt;a href="https://github.com/echonest/kiosk/blob/master/README.md"&gt;HOWTO&lt;/a&gt;. &lt;/p&gt;</description><link>http://notes.variogr.am/post/28001536318</link><guid>http://notes.variogr.am/post/28001536318</guid><pubDate>Wed, 25 Jul 2012 16:30:46 -0400</pubDate></item><item><title>Today in the fabulous uses of audio technologies department, the...</title><description>&lt;img src="http://25.media.tumblr.com/tumblr_m7n874NSdK1qz4g66o1_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;Today in the fabulous uses of audio technologies department, the &lt;a href="https://twitter.com/bkstreetmusic"&gt;bkstreetmusic&lt;/a&gt; Twitter account&lt;/p&gt;</description><link>http://notes.variogr.am/post/27879782656</link><guid>http://notes.variogr.am/post/27879782656</guid><pubDate>Mon, 23 Jul 2012 22:28:16 -0400</pubDate></item><item><title>The audio fingerprinting at the Echo Nest FAQ</title><description>&lt;p&gt;&lt;img src="http://img.skitch.com/20120722-x959r134m77tb4g867m6taeqer.jpg" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;One of the most popular APIs at the Echo Nest is for our &lt;em&gt;two&lt;/em&gt; open audio fingerprinting systems — &lt;a href="http://blog.echonest.com/post/545323349/the-echo-nest-musical-fingerprint-enmfp"&gt;ENMFP&lt;/a&gt; or &lt;a href="http://echoprint.me"&gt;Echoprint&lt;/a&gt;. Between them both, we currently process queries for over 40 million songs a week— almost &lt;em&gt;70 fingerprints a second&lt;/em&gt;. We announced ENMFP two years ago and &lt;a href="http://blog.echonest.com/post/25996355671/echoprint-opensource-audio"&gt;Echoprint one year later&lt;/a&gt; and it’s been going very fast since then. The world seemed to need an open music identification service and we were happy to provide it.&lt;/p&gt;

&lt;p&gt;I co-developed both FP technologies (ENMFP using Tristan’s &lt;a href="http://developer.echonest.com/docs/v4/track.html"&gt;Echo Nest Analyze&lt;/a&gt; and Echoprint with &lt;a href="http://www.ee.columbia.edu/~dpwe"&gt;Dan Ellis&lt;/a&gt; at Columbia)  and often get the collected questions from our developers, customers and interested parties. There’s a lot of good questions that we haven’t answered very well anywhere else on our website or developer docs, so I thought I would collect them all in one place in sort of a living document on my blog.&lt;/p&gt;

&lt;h2&gt;Echo Nest fingerprinting FAQs&lt;/h2&gt;

&lt;p&gt;Last updated: July 22 2012&lt;/p&gt;


&lt;ul&gt;&lt;li&gt;&lt;a href="#faq1"&gt;Some definitions to help&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq2"&gt;What fingerprinting technologies does the Echo Nest provide?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq3"&gt;How much do they cost? Can I use them anywhere? Commercially? What about the data or the server&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq4"&gt;Can they fix my (or my users’) metadata on files on a hard drive?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq5"&gt;I have a ton of audio (I’m a streaming service or something). Can I just run my own Echoprint server and codegen and compute my own codes and never talk to you again?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq6"&gt;How much of the original audio do I need to match? Do I have to start at the beginning?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq7"&gt;Once I have audio, how fast can I identify a file?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq8"&gt;Can I make it work like Shazam or SoundHound?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq9"&gt;Echoprint is newer. Will ENMFP still be supported?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq10"&gt;Are there any whitepapers or more information on how they work?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq11"&gt;Can I run ENMFP or Echoprint on a mobile device?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq12"&gt;I tried song/identify and got no results!!&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq14"&gt;What type of audio does fingerprinting work on?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq15"&gt;Can audio fingerprinting be used to detect cover versions or live versions of songs? What determines the same song?&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;&lt;a href="#faq16"&gt;Where can I go for help?&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;&lt;a name="faq1"&gt; &lt;/a&gt;Some definitions to help:&lt;/h3&gt;

&lt;ul&gt;&lt;li&gt;&lt;strong&gt;code generator&lt;/strong&gt; (or often “codegen”): a piece of software that runs on a computer, mobile device or server that computes “codes” from an audio stream meant for sending to a fingerprint lookup server for a possible match.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;query server&lt;/strong&gt;: a stack that receives codes and searches through data to find a match&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;matching data&lt;/strong&gt;: the set of resolvable song codes that a query server can match, given query codes. Matching data must be computed on the whole song as query codes can be from anywhere in the song.&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;&lt;a name="faq2"&gt; &lt;/a&gt;What fingerprinting technologies does the Echo Nest provide?&lt;/h3&gt;

&lt;p&gt;We have two: &lt;strong&gt;ENMFP&lt;/strong&gt; and &lt;strong&gt;Echoprint&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq3"&gt; &lt;/a&gt;How much do they cost? Can I use them anywhere? Commercially? What about the data or the server?&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;ENMFP&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Code generator is free to use for anything — commercial or otherwise. &lt;a href="http://developer.echonest.com/downloads/license"&gt;The codegen binary for Windows, Mac OS X and Linux (there is no source available) is available here, although you must have an active Echo Nest Developer account.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Server code is not available&lt;sup id="fnref:p27796385927-1"&gt;&lt;a href="#fn:p27796385927-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;. The Echo Nest maintains the only ENMFP server via the &lt;a href="http://developer.echonest.com/docs/v4/song.html#identify"&gt;song/identify API&lt;/a&gt; with a version number of 3.15.&lt;/li&gt;
&lt;li&gt;The matching data has roughly 35 million unique songs and is not available publicly.&lt;/li&gt;
&lt;li&gt;The &lt;a href="http://developer.echonest.com/docs/v4/song.html#identify"&gt;song/identify&lt;/a&gt; call can be used by anyone, even commercially (there is an &lt;a href="http://developer.echonest.com/docs/v4#ground-rules"&gt;exception in our terms for this&lt;/a&gt;) but you will be under a rate limit of around 120 calls per minute, subject to change.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Echoprint&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;Code generator is &lt;a href="https://github.com/echonest/echoprint-codegen"&gt;free to use and open source.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Server code is also &lt;a href="https://github.com/echonest/echoprint-server"&gt;free to use and open source.&lt;/a&gt;. The Echo Nest maintains an Echoprint server via the &lt;a href="http://developer.echonest.com/docs/v4/song.html#identify"&gt;song/identify API&lt;/a&gt; with any version number set to 4.0 or higher.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://echoprint.me/data"&gt;The matching data is available for download&lt;/a&gt; with the intent that outside sources can add to it later. The data is small at the moment — roughly 200,000 songs, but will be growing very soon.&lt;/li&gt;
&lt;li&gt;The &lt;a href="http://developer.echonest.com/docs/v4/song.html#identify"&gt;song/identify&lt;/a&gt; call can be used by anyone, even commercially (there is an &lt;a href="http://developer.echonest.com/docs/v4#ground-rules"&gt;exception in our terms for this&lt;/a&gt;) but you will be under a rate limit of around 120 calls per minute, subject to change. &lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;&lt;a name="faq4"&gt; &lt;/a&gt;Can they fix my (or my users’) metadata on files on a hard drive?&lt;/h3&gt;

&lt;p&gt;For ENMFP, absolutely. You have a matching database of &lt;em&gt;most music&lt;/em&gt; available for you. You can quickly test this out on your computer’s music. First, download the ENMFP codegen and run this:&lt;/p&gt;

&lt;script src="https://gist.github.com/1563420.js?file=gistfile1.sh"&gt;&lt;/script&gt;&lt;p&gt;Then, run this python script or something like it to look up each code:&lt;/p&gt;

&lt;script src="https://gist.github.com/1563459.js?file=gistfile1.py"&gt;&lt;/script&gt;&lt;p&gt;(Obviously, you’d want to thread this call out in real world use.) There, you’ve built a scan and match service! Sell it for $50 each on the Mac App store.&lt;/p&gt;

&lt;p&gt;For Echoprint, yes, although the matching database is still small.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq5"&gt; &lt;/a&gt;I have a ton of audio (I’m a streaming service or something). Can I just run my own Echoprint server and codegen and compute my own codes and never talk to you again?&lt;/h3&gt;

&lt;p&gt;Yes. In fact, a lot of very big companies are doing this.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq6"&gt; &lt;/a&gt;How much of the original audio do I need to match? Do I have to start at the beginning?&lt;/h3&gt;

&lt;p&gt;For ENMFP, we suggest 50 codes worth of audio, which is usually between 15 and 20 seconds. For Echoprint, we suggest 20 seconds worth of audio. The audio can be from anywhere within the song.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq7"&gt; &lt;/a&gt;Once I have audio, how fast can I identify a file?&lt;/h3&gt;

&lt;p&gt;ENMFP’s codegen computes in roughly 20x real time — for a 30s sample it will take 1.5 seconds, for example. Echoprint’s codegen is roughly 1000x real time.&lt;/p&gt;

&lt;p&gt;The server side varies on load and database side, but we aim to keep both within a 500ms response and support 50 queries a second per server. Those booting their own servers can easily shard or mirror multiple servers in a load balancer.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq8"&gt; &lt;/a&gt;Can I make it work like Shazam or SoundHound?&lt;/h3&gt;

&lt;p&gt;This usually means “will over-the-air queries work,” as in a bar or noisy place. For ENMFP, no. ENMFP is only supported to work on files or “clean audio.” For Echoprint, this is the intent, although as of right now we do not support that.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq9"&gt; &lt;/a&gt;Echoprint is newer. Will ENMFP still be supported?&lt;/h3&gt;

&lt;p&gt;Yes. The only thing that may change is the name, as it’s getting annoying to talk about two separate fingerprints. Even once we have a real catalog backing Echoprint, ENMFP still has some good properties that are useful for a lot of our customers.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq10"&gt; &lt;/a&gt;Are there any whitepapers or more information on how they work?&lt;/h3&gt;

&lt;p&gt;Very short ones: &lt;a href="http://ismir2011.ismir.net/latebreaking/LB-7.pdf"&gt;here&amp;#8217;s Echoprint&amp;#8217;s&lt;/a&gt; and here&amp;#8217;s &lt;a href="http://www.ee.columbia.edu/~dpwe/pubs/EllisWJL10-ENfprint.pdf"&gt;ENMFP&amp;#8217;s&lt;/a&gt;.

&lt;/p&gt;&lt;h3&gt;&lt;a name="faq11"&gt; &lt;/a&gt;Can I run ENMFP or Echoprint on a mobile device?&lt;/h3&gt;

&lt;p&gt;There are two ways to run a fingerprint on mobile: one is to compute the codes on the device and query the server, another is to send a low bandwidth audio stream to another server, which computes the codes and then sends it to the server. Both ENMFP and Echoprint can run in the latter.&lt;/p&gt;

&lt;p&gt;ENMFP is too compute-intensive to run the codegen on the device and we do not provide a codegen that runs on any mobile.&lt;/p&gt;

&lt;p&gt;Echoprint can easily compute codes on a device, and we provide &lt;a href="https://github.com/echonest/echoprint-ios-sample"&gt;an example Xcode project for iOS&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;&lt;a name="faq12"&gt; &lt;/a&gt;I tried song/identify and got no results!!&lt;/h3&gt;

&lt;p&gt;The main problems are almost always one of the following:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;You have Echoprint codes but are not specifying the version # (has to be 4.0 or over) in the call&lt;/li&gt;
&lt;li&gt;You have ENMFP codes but are specifying a &amp;gt;4.0 version number.&lt;/li&gt;
&lt;li&gt;You are running the example codegen for either and your version of FFMpeg is not set up to decode MP3 files.&lt;/li&gt;
&lt;li&gt;You are using Echoprint and the match is not in our test 200,000 song database.&lt;/li&gt;
&lt;/ol&gt;&lt;h3&gt;&lt;a name="faq14"&gt; &lt;/a&gt;What type of audio does fingerprinting work on?&lt;/h3&gt;

&lt;p&gt;ENMFP and Echoprint were originally intended to provide fingerprinting for musical audio, and that is primarily what they are used for (for some definition of &amp;#8220;music&amp;#8221;). However, several of our users have also reported success stories about using Echoprint on speech audio.

&lt;/p&gt;&lt;h3&gt;&lt;a name="faq15"&gt; &lt;/a&gt;Can audio fingerprinting be used to detect cover versions or live versions of songs? What determines the same song?&lt;/h3&gt;

&lt;p&gt;No. ENMFP and Echoprint are designed to identify the &amp;#8220;same&amp;#8221; rendition or recording of a particular song. For the purposes of audio fingerprinting, cover versions and live versions are considered to be &amp;#8220;different&amp;#8221;, and hence queries will be identified as such.

&lt;/p&gt;&lt;p&gt;A song is defined by being the same recording or having the same master recording. Both fingerprints will consider a remastered version the same song, will consider a radio edit the same song, and will consider different encodings (e.g. MP3 at different bitrates) the same song. 

&lt;/p&gt;&lt;h3&gt;&lt;a name="faq16"&gt; &lt;/a&gt;Where can I go for help?&lt;/h3&gt;

&lt;p&gt;If you are using either FP system, best to:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="https://groups.google.com/forum/#!forum/echoprint"&gt;Join the Echoprint google group.&lt;/a&gt; Best place to discuss either.
&lt;/li&gt;&lt;li&gt;&lt;a href="http://developer.echonest.com/forums"&gt;Watch or post to the Echo Nest Developer Forums&lt;/a&gt; if you have any trouble with the API (not the codegen or server code.)
&lt;/li&gt;&lt;/ul&gt;&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p27796385927-1"&gt;
&lt;p&gt;The server stack between ENMFP and Echoprint is very similar and it’s very possible a dedicated developer can make one work like the other. But not supported. &lt;a href="#fnref:p27796385927-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://notes.variogr.am/post/27796385927</link><guid>http://notes.variogr.am/post/27796385927</guid><pubDate>Sun, 22 Jul 2012 20:24:00 -0400</pubDate></item><item><title>How well does music predict your politics?</title><description>&lt;p align="center"&gt;&lt;img alt="Romney with Kid Rock" src="http://img.skitch.com/20120710-ke64a3hempebk2xcdf12apu949.jpg"/&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;Mitt Romney with Kid Rock&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Maybe you should just plug your iPod into the booth or connect the Diebold machine to Facebook this November. It started as an office joke, but after running the numbers, I can’t escape the data. It turns out music preference is pretty well correlated with political affiliation.&lt;/p&gt;
&lt;p&gt;A few highlights:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Republicans seem to have &lt;a href="#diverse"&gt;less diverse music taste than Democrats&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Kenny Chesney fans are &lt;a href="#artists"&gt;most likely to swing right, Rihanna fans left.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Metal fans could &lt;a href="#metal"&gt;save us all from a two-party system.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;h2 id="tasteprofile"&gt;Taste Profiles&lt;/h2&gt;
&lt;p&gt;Over the past few years, we here at &lt;a href="http://the.echonest.com"&gt;The Echo Nest&lt;/a&gt; have put a lot of engineer energy into “Taste Profiles” – our server-side representation of everything a listener does with music. We first released them in &lt;a href="http://musicmachinery.com/2010/10/15/the-echo-nest-gets-personal/"&gt;2010&lt;/a&gt;, then we linked them to our playlist API &lt;a href="http://blog.echonest.com/post/10005968004/personalize-your-echo-nest-playlists"&gt;with catalog-radio&lt;/a&gt;, and recently &lt;a href="http://blog.echonest.com/post/11992136676/taste-profiles-get-added-to-the-million-song-dataset"&gt;released a large amount for research&lt;/a&gt;. Anyone can store their musical activity across services in our “cloud” from metadata, audio files or fingerprints, and update stats like play counts, skips, ratings, loves and bands. The Taste Profile ID can then do all sorts of stuff – everything from recommending you new music, to syncing your local collection to a cloud service, to suggesting shows to see, to some of the crazy stuff I’ll walk through below.&lt;/p&gt;
&lt;p&gt;Quite a lot of our &lt;a href="http://developer.echonest.com"&gt;API developers&lt;/a&gt; and customers are already using Taste Profiles to manage the musical identity of their customers, and we use them internally as the proving ground for a lot of our analytics work. For example, we spent the last few months making sure people listen to our automated radio service (used in iHeartRadio, Spotify, VEVO, MOG and many others) as long as possible by predicting how they’ll respond to our suggestions. And with today’s release of new &lt;a href="http://blog.echonest.com"&gt;Taste Profile key-values&lt;/a&gt; you can now annotate a Taste Profile with any information you want (such as your location, the device you&amp;#8217;re using, your IDs on social networks or anything else) and we’ll use it to give you better results. Part of the push behind arbitrary data in a musical identify is to track how “non-musical behaviors” can make our results better. If you live in Sweden, maybe you don’t want to hear ABBA anymore. Or you’ve just seen a Wes Anderson film &amp;#8212; we might want to send you down a Kinks path.&lt;/p&gt;
&lt;p&gt;We’ve been collecting this (completely anonymized of course) data for a while now and started looking into what correlations exist between music, psychographics, demographics and other media preferences. As the time is upon us in the US to start thinking about who to elect as president, we thought we’d first look to see what political affiliation data we had and if it had any correlation to music. Can we tell if someone is a Republican just from his or her iTunes collection? And if so, which artists are the key “tells” for both sides? What we found was fascinating.&lt;/p&gt;
&lt;h2 id="predictingpoliticsfrommusic"&gt;Predicting politics from Taste Profiles&lt;/h2&gt;
&lt;p align="center"&gt;&lt;img alt="Obama with BB King" src="http://img.skitch.com/20120710-kdkt69ktf7dqp243w19xw9jmbu.jpg"/&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;Barack Obama with BB King&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Some couching notes before we get started: although we have far more data, we’ll only look at listeners that self-report as either “Democrat-aligned” or “Republican-aligned” and are living in the US, to make it a quicker read.&lt;a class="footnote" href="#fn:1" id="fnref:1" title="see footnote"&gt;[1]&lt;/a&gt; The political alignment was automatically derived from annotations of political figures or parties, we grouped prominent political figures such as Bill or Hillary Clinton, Barack Obama, John Kerry and so on as Democrat alignment, as would be a specific party affiliation. Likewise for republican: George Bush, Mitt Romney, John McCain, Sarah Palin, and so on. If someone listed both or had conflicting affiliations we did not include them in this experiment, nor did we consider any other party or country.&lt;a class="footnote" href="#fn:2" id="fnref:2" title="see footnote"&gt;[2]&lt;/a&gt; Finally, throughout this I’ll call the two classes “Democrats” and “Republicans,” even though that&amp;#8217;s an incredibly generalized version of the available data.&lt;/p&gt;
&lt;p&gt;So let’s get to it: let’s take a bunch of Taste Profiles, see which ones have political affiliation listed and then try to learn the relationship between the musical data in the Taste Profile and the affiliation.&lt;/p&gt;
&lt;p&gt;What kind of musical data are we talking about here? An Echo Nest Taste Profile can be as simple as a list of artists the listener likes or can include very detailed information about his or her listening activity. The Echo Nest then of course has tens of thousands of points of data to associate with each artist or song. These data points are as mundane as the name of the artist (“Carly Rae Jepsen”) to as complex as the number of millseconds in between each downbeat (4508), or the predicted key (E major), or the probabilities of words people use to describe the artist on the internet (“angular,” “stupid,” “witch house.”)&lt;a class="footnote" href="#fn:3" id="fnref:3" title="see footnote"&gt;[3]&lt;/a&gt; We use all of this data to recommend you music on MTV.com or play you a great station on iHeartRadio, and here we’re going to use it to see if you like big government.&lt;/p&gt;
&lt;p align="center"&gt;&lt;img alt="Cultural vectors used in Echo Nest musical analysis, here for ABBA" src="http://img.skitch.com/20111219-f8xs4bir6rnur17u8prqqm9rxx.jpg"/&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;Cultural vectors used in Echo Nest musical analysis, here for “ABBA”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;For every person&amp;#8217;s Taste Profile, we have many thousands of terms that describe the kind of music the person is into. For this experiment, half of the Taste Profiles’ worth of musical term data is thrown into The Echo Nest&amp;#8217;s statistical machine learning classifiers (we have our own custom stuff that mostly acts as a very large scale multi-class &lt;a href="http://en.wikipedia.org/wiki/Support_vector_machine"&gt;support vector machine&lt;/a&gt;.&lt;a class="footnote" href="#fn:4" id="fnref:4" title="see footnote"&gt;[4]&lt;/a&gt;) and associated with the “ground truth” of affiliation to try to learn a model of each class. In layman&amp;#8217;s terms, this basically means we show the system a bunch of examples of Democrat Taste Profiles and a bunch of Republican ones and see if it can predict the class on a new, unknown Taste Profile. Our machine learning tech is good at handling messy data like this&amp;#8212; we’ve added a lot of math and magic on top to deal with our specific kind of musical data.&lt;a class="footnote" href="#fn:5" id="fnref:5" title="see footnote"&gt;[5]&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="theresults"&gt;The results&lt;/h2&gt;
&lt;p&gt;After we’ve learned the model we can then test it by giving it the &lt;em&gt;other half&lt;/em&gt; of the data &amp;#8212; asking our classifiers to identify each previously unknown Taste Profile as each Democrat or Republican &amp;#8212; to see how well it does at prediction. We use a few measures to evaluate the experiment:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;raw accuracy (out of all the test examples, how many did we get right)&lt;/li&gt;
&lt;li&gt;precision (out of the ones we predicted in the class, how many were in that class)&lt;/li&gt;
&lt;li&gt;recall (out of all the examples of that class, how many did the classifier find) and&lt;/li&gt;
&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/F1_score"&gt;F1&lt;/a&gt;, a blended measure of precision and recall that people in this field like to use as a general performance metric.&lt;/li&gt;
&lt;/ol&gt;&lt;p align="center"&gt;&lt;img alt="Prediction accuracies" src="http://img.skitch.com/20120711-xstaxxqy4x76fmibtngdu3yce7.jpg"/&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;Prediction accuracies&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Looks like we’ve got something here! We hit an F1 of over 0.8 for Republican prediction and just under 0.4 for Democrat prediction. These are both good numbers, in line or above many other prediction algorithms. But why is our Democrat prediction less than half as accurate as our Republican prediction? Shouldn’t they be the same?&lt;/p&gt;
&lt;h2&gt;Musical diversity&lt;/h2&gt;
&lt;p&gt;Political affiliation is not binary, and it’s not like we can assume that just because someone didn’t explicitly list an affiliation with the Democratic party that they are voting for Romney. And it turns out the correlation between musical preference and Democrat affiliation is slightly harder to tease out. When I was going over the data I had a theory: &lt;em&gt;Republicans might listen to fewer kinds of music&lt;/em&gt;. If most of the class you are trying to predict stays within a narrow range of music types, they’re easier to spot. Conversely, if the class you’re predicting is all over the musical map, it becomes harder to make accurate predictions. And the data shows it. If we add up the occurrence of each musical term associated with each person (for example: Joe listens to “rock” at “110 bpm” that sounds like “Aerosmith” or “the 70s” and is voting for Romney, we mark a +1 under each of those terms for the Republican bucket) and then plot the histogram counts in descending order, we see a clear difference in both the magnitude and distribution of musical types for the two political affiliations.&lt;a class="footnote" href="#fn:6" id="fnref:6" title="see footnote"&gt;[6]&lt;/a&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;img alt="Histogram counts of the top occurring musical terms for each class." src="http://img.skitch.com/20120710-ksqudced7ayi3msasrk7gp7kpu.jpg"/&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;Histogram counts of the top occurring musical terms for each class.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Not only are there less musical types overall, but the clear almost right angle “elbow” of the Republican histogram distribution shows that after a small set of top ranking terms, their listenership tends to have far less musical diversity. The Democrat curve is smoother, indicating that those people listen to more types of music overall. Overall, &lt;em&gt;for every 10 unique musical types Democrats listen to, Republicans listen to just 7.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As silly as it might seem to predict political affiliation with music, this is a great example of why we do this kind of analysis. Now that we&amp;#8217;ve found room for improvement in predicting a class by looking at diversity as a feature, we can apply this in our work to make our playlists and recommendations better. We&amp;#8217;re currently adding a “diversity index” to our Taste Profile back end to model this exact effect.&lt;/p&gt;
&lt;h2 id="whoiskennychesneyvotingfor"&gt;Who are Kenny Chesney fans voting for?&lt;/h2&gt;
&lt;p&gt;Another fun thing to look with these political classifiers is the inputs that the machine thought were the most predictive of political affiliation. For each class, you can do some quick math on the model by inspecting the margin. This tells us which musical terms and properties best separate each political class:&lt;/p&gt;
&lt;p align="center"&gt;&lt;img alt="A diagram of the support vector machine in action. The dots on the dotted lines are the support vectors where training examples are used to define the classification boundaries." src="http://img.skitch.com/20120710-bpek2scd2513qwe6i9498mx5bu.jpg"/&gt;&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;In this diagram of the support vector machine in action, the circles on the dotted lines are the support vectors where training examples are used to define the classification boundaries.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Since a list of probabilities of music vectors isn’t exactly good blog material, I found the closest matching artists to each set of terms to show here. We get a great list from doing this:&lt;/p&gt;
&lt;div width="250" style="margin:20px; background-color:#FA8072;"&gt;
&lt;p align="center"&gt;&lt;em&gt;Artists whose fans are most correlated to Republican&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;1. Kenny Chesney&lt;/li&gt;
&lt;li&gt;2. George Strait&lt;/li&gt;
&lt;li&gt;3. Reba McEntire&lt;/li&gt;
&lt;li&gt;4. Tim McGraw&lt;/li&gt;
&lt;li&gt;5. Jason Aldean&lt;/li&gt;
&lt;li&gt;6. Blake Shelton&lt;/li&gt;
&lt;li&gt;7. Shania Twain&lt;/li&gt;
&lt;li&gt;8. Kelly Clarkson&lt;/li&gt;
&lt;li&gt;9. Pink Floyd&lt;a class="footnote" href="#fn:7" id="fnref:7" title="see footnote"&gt;[7]&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;10. Elvis Presley&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div width="250" style="margin:20px; background-color:#8080FA;"&gt;
&lt;p align="center"&gt;&lt;em&gt;Artists whose fans are most correlated to Democrat&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;1. Rihanna&lt;/li&gt;
&lt;li&gt;2. Jay-Z&lt;/li&gt;
&lt;li&gt;3. Madonna&lt;/li&gt;
&lt;li&gt;4. Lady Gaga&lt;/li&gt;
&lt;li&gt;5. Katy Perry&lt;/li&gt;
&lt;li&gt;6. Snoop Dogg&lt;/li&gt;
&lt;li&gt;7. Chris Brown&lt;/li&gt;
&lt;li&gt;8. Usher&lt;/li&gt;
&lt;li&gt;9. Eminem&lt;/li&gt;
&lt;li&gt;10. Bob Marley&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;p&gt;The Democrat high predictive list reminds me of another experiment I ran: taking the &lt;a href="http://www.billboard.com/charts/hot-100#/charts/hot-100"&gt;Billboard Top 10&lt;/a&gt; and feeding it through the predictor. As of this writing the top artists are &lt;strong&gt;Carly Rae Repsen&lt;/strong&gt;, &lt;strong&gt;Maroon 5&lt;/strong&gt;, &lt;strong&gt;Gotye&lt;/strong&gt;, &lt;strong&gt;Katy Perry&lt;/strong&gt;, &lt;strong&gt;Rihanna&lt;/strong&gt;, &lt;strong&gt;Ellie Goulding&lt;/strong&gt;, &lt;strong&gt;fun.&lt;/strong&gt;, &lt;strong&gt;Nicki Minaj&lt;/strong&gt;, &lt;strong&gt;David Guetta&lt;/strong&gt; and &lt;strong&gt;Usher&lt;/strong&gt; &amp;#8212; almost all artists that skew very strongly Democratic. If only people that buy singles in music stores vote this November, it will be a complete Obama landslide.&lt;/p&gt;
&lt;h2&gt;Classifier confusion prediction&lt;/h2&gt;
&lt;p&gt;Lastly, I thought the “highest confusion” artist list was interesting. These artists are not good predictors of either Republican or Democrat, so if you like them, you’re relatively safe from this Minority Report shit.&lt;/p&gt;
&lt;div width="250" style="margin:20px; background-color:#A0A0A0;"&gt;
&lt;p align="center"&gt;&lt;em&gt;Artists whose fans are hardest to predict for either Democrat or Republican&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;1. The Beatles&lt;/li&gt;
&lt;li&gt;2. Marilyn Manson&lt;/li&gt;
&lt;li&gt;3. The Rolling Stones&lt;/li&gt;
&lt;li&gt;4. Johnny Cash&lt;/li&gt;
&lt;li&gt;5. Pantera&lt;/li&gt;
&lt;li&gt;6. Alice in Chains&lt;/li&gt;
&lt;li&gt;7. Paradise Lost&lt;/li&gt;
&lt;li&gt;8. Moonspell&lt;/li&gt;
&lt;li&gt;9. Fleetwood Mac&lt;/li&gt;
&lt;li&gt;10. Tiamat&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;p&gt;I found it neat these non-predictive artists were mainly metal. Perhaps the genre that can finally bring this divided country together or break the lock on the two party system.&lt;/p&gt;
&lt;p align="center"&gt;&lt;img alt="" src="http://img.skitch.com/20120710-c7auq2cst4fxqxy64kxx9kw8j6.jpg"/&gt;&lt;/p&gt;
&lt;h2 id="oknowwhat"&gt;OK, now what?&lt;/h2&gt;
&lt;p&gt;Obviously, The Echo Nest is not going to quit our day jobs to become pollsters. But it’s fascinating how much information about you is sitting inside your musical tastes, and we also appreciate how we can use these experiments to tune our models to make your listening experience better on everything we power (Clear Channel&amp;#8217;s iHeartRadio, eMusic, MOG, Spotify, Nokia, the BBC, VEVO, and many more.) We have a lot more ideas around this angle. Stay tuned to this blog for more experiments, and please let me know if you’ve got any great ideas.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Huge thanks to proofreaders / reviewers Dan Ellis, Eliot van Buskirk, Jim Lucchese&lt;/em&gt;&lt;/p&gt;
&lt;div class="footnotes"&gt;&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:1"&gt;
&lt;p&gt;All data used in this experiment (and all Echo Nest Taste Profile data used in analytics) was carefully anonymized before we received it. &lt;a class="reversefootnote" href="#fnref:1" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Please get in touch if you’ve got a great idea for another experiment on data like this that I don’t cover! &lt;a class="reversefootnote" href="#fnref:2" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;Perhaps fit for another blog post, but if you’re reading down here: it turns out that the musical data the Echo Nest has is crucial for the experiment to work. Just training a model against artist or song names alone did not work nearly as well. F1s dropped on average of 20–40% depending on the task. &lt;a class="reversefootnote" href="#fnref:3" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;In practice we use a somewhat “reduced” form of the SVM known as RLSC so that we can easily scale across many machines, but the intent is the same. You can read about RLSC on music data &lt;a href="http://alumni.media.mit.edu/~bwhitman/whitman_ellis_recordreviews.pdf"&gt;in an old paper of mine if you’re interested. (PDF)&lt;/a&gt; &lt;a class="reversefootnote" href="#fnref:4" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;We trained two binary classifiers: Rep vs. Dem and Dem vs. Rep. The training data chose a large random sample from the universe of data, which was 66% Dem and 34% Rep.&lt;a class="reversefootnote" href="#fnref:5" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;To make this histogram chart, we normalized both bucket counts first by the universe of terms to ensure this bias in the priors would not affect the distribution. &lt;a class="reversefootnote" href="#fnref:6" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;One day, Echo Nest will start delineating between “good” Pink Floyd (Syd) and the other kind &lt;a class="reversefootnote" href="#fnref:7" title="return to article"&gt; ↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;&lt;/div&gt;</description><link>http://notes.variogr.am/post/26869688460</link><guid>http://notes.variogr.am/post/26869688460</guid><pubDate>Thu, 12 Jul 2012 10:55:13 -0400</pubDate><category>echonest</category><category>echo nest</category><category>music</category><category>politics</category><category>prediction</category><category>machine learning</category></item><item><title>Why music ID resolution matters to every music fan on Facebook</title><description>&lt;p&gt;&lt;img src="http://img.skitch.com/20111011-ejag76ekr6c28kfrg69g18ucue.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Facebook&amp;#8217;s &lt;a href="http://techcrunch.com/2011/09/22/live-from-facebooks-2011-f8-conference-video/"&gt;music announcement&lt;/a&gt; a couple of weeks ago was a huge land grab, an audacious move to get itself ensconced as the nexus of that music platform &lt;a href="http://notes.variogr.am/post/1373556723/the-future-music-platform-music-startups-imminent"&gt;I&amp;#8217;ve been talking about.&lt;/a&gt; On paper and on stage the service looked game changing: all your music players and services all in one place, neatly collected with your friends to help you navigate the massive world of music. Myself and my engineering team at The Echo Nest have probably spent as much time thinking about that massive world of music as anyone on earth, so I thought I&amp;#8217;d try putting it through its paces.&lt;/p&gt;
&lt;p&gt;Facebook’s recently-launched music service now shows every music fan why the crazy and complicated world of music ID resolution matters to all of us.  The more social our music activity becomes, the more music data becomes relevant to music fans every day.&lt;/p&gt;
&lt;p&gt;Though Facebook music has only been live for a couple of weeks, Facebook is clearly struggling with some well-worn challenges in music ID resolution &amp;#8212; problems I’ve been dealing with for many years now.  Below are some examples highlighting the promise of Facebook music and some common music ID resolution problems they’ll need to fix to really deliver on the promise&amp;#8230;&lt;/p&gt;
&lt;h3&gt;The holy grail &amp;#8212; &amp;#8216;universal song ID&amp;#8217;&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;The only way social music features can work is if the song you want to hear actually plays.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you&amp;#8217;re listening to a song in Spotify, it will broadcast to Facebook and all your friends see what you are listening to in real time&lt;sup&gt;&lt;a href="#0"&gt;0&lt;/a&gt;&lt;/sup&gt;. If they want to play along, your friends click a simple play button to hear it themselves. No Spotify? No problem&amp;#8212; Facebook launched with a &lt;a href="http://www.businessinsider.com/list-of-facebook-music-services-2011-9"&gt;huge array of music content partners&lt;/a&gt; (with some conspicuous elephants missing) and, if that song is available in your choice of player, it will play with no &amp;#8220;friction.&amp;#8221; You&amp;#8217;ll see something nice like:&lt;/p&gt;
&lt;p&gt;&lt;img title="Facebook correctly resolving Apply by Glasser" src="http://img.skitch.com/20110926-khq5nj5uykdw1k9nc1eyq5t84n.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Which opens the song in your selected player (in this case, it&amp;#8217;s Rdio, which just needs to be open in your browser.) This is inconceivably great for music consumers&amp;#8212; there&amp;#8217;s a lot of great competing services, all with unique features and cost structures, and giving more choice is always good. I may want to use MOG because it has excellent Echo Nest-powered discovery features. Or Rdio or Spotify because they allow 3rd party mobile apps. Or Slacker or iHeartRadio for a radio experience.&lt;/p&gt;
&lt;h3&gt;What song is this?&lt;/h3&gt;
&lt;p&gt;This is all music-world changing stuff, if it worked. When I first played with Facebook Music, I tried a simple example. I put on the most terrible popular song I could think of in my Spotify player:&lt;/p&gt;
&lt;p&gt;&lt;img title="For science, i listened to this john mayer song" src="http://img.skitch.com/20110927-c23dwtxsw7tpg7tuxdh8t3s552.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;And then I asked Facebook to &amp;#8220;Play in Rdio.&amp;#8221; I heard something sort of like it but not exactly:&lt;/p&gt;
&lt;p&gt;&lt;img title="But look! It plays this terrible hit crew version" src="http://img.skitch.com/20110927-jyngsx5k2bxceary5xsgjn3gue.jpg" width="400"/&gt;&lt;/p&gt;
&lt;p&gt;Here Facebook has decided that Rdio’s version of “Your Body is a Wonderland” is a sounds-like version from “The Hit Crew.” I cannot think of a worse fate: hearing something worse than John Mayer when you have to click on a link that says John Mayer. (Consider clicking on a Google search result for your dentist&amp;#8217;s office phone number and getting your ex-girlfriend instead.)&lt;/p&gt;
&lt;p&gt;At The Echo Nest, we know “The Hit Crew” all too well; they crank out &amp;#8220;soundslike&amp;#8221; versions of tunes.  This is a great example of a basic music resolving problem:  every song in any reasonably sized catalog has dozens of karaoke versions, covers, instrumentals, yoga mixes, etc. For Facebook to resolve a top 20 single to its sounds-like version is pretty ugly. What&amp;#8217;s going on?&lt;/p&gt;
&lt;p&gt;Moving on, that Glasser song up top? Later that day I clicked on another friend listening to the same song, but it was different this time:&lt;/p&gt;
&lt;p&gt;&lt;img title="It doesn't know about it this time" src="http://img.skitch.com/20110926-xjxku23gk7yhnt39r1yd8r7kye.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Wait a minute, what’s the difference? I obviously know what&amp;#8217;s going on &amp;#8212; the Spotify &amp;#8220;Apply&amp;#8221; is from a compilation while the Rdio version is on the full length release. If you click on the &lt;em&gt;release&lt;/em&gt; version on Spotify, you can resolve the song to any other service for your friends and get recommendations, or participate in the social listening experience. But if you click on the &lt;em&gt;compilation&lt;/em&gt; version (which is the default when typing Glasser Apply in the Spotify search box), you get nothing. The result: the song you hear might as well have been something you recorded in your basement last night, even though Rdio has what you were hoping for:&lt;/p&gt;
&lt;p&gt;&lt;img title="But it's definitely there in Rdio" src="http://img.skitch.com/20110926-ppbjtyexxt2qgbegjy1tphp4hf.jpg" width="400"/&gt;&lt;/p&gt;
&lt;p&gt;This is another common ID resolution problem.  Facebook likely isn’t working from a canonical database of songs or artists, rather using loose references to them from their own data and partners. And the glue linking those ID structures together is brittle, making for risky connections and some strange user experiences when translating across services.&lt;/p&gt;
&lt;h3&gt;Why it matters&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Accurate resolving is the necessary backbone of social music.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;These examples show that ID resolution isn’t just the plumbing underneath a social music experience &amp;#8212; it is the foundation of any good music service that allows sharing.  If songs don’t play when they should or link to the wrong song, people can’t talk about them.&lt;/p&gt;
&lt;p&gt;Even imagining a world in which there is only Spotify in Facebook, consider the following realities: any real music fan will want to connect with other types of services: radio players like iHeartRadio, video services like VEVO, reviews, blogs, biographies, artist photos, games, publishing platforms like Soundcloud, and so on.&lt;/p&gt;
&lt;p&gt;We obsess about this problem.  I’m guessing Facebook is obsessing about it now, because it introduces friction millions (&lt;a href="http://www.youtube.com/watch?v=FwaGF2CbVIw&amp;amp;t=17s"&gt;or billions&lt;/a&gt;) of times per day into what Facebook wants to be a “frictionless” experience.  The more social your music activity, the more you’ll agree that any decent social music service service needs to know that two slightly differently spelled artists may be the same artist. Or that the radio edit of a song can be played in place of the single version.&lt;/p&gt;
&lt;p&gt;As implemented, the v.1 Facebook music experience is like comparing snowflakes with a ruler. Right now it impacts the user experience, but the effects could definitely get worse as more users and more services join the fray.&lt;/p&gt;
&lt;p&gt;For me, clicking around Facebook Music these days is tough. It’s rare that anything I’d want to hear gets resolved properly:&lt;/p&gt;
&lt;p&gt;&lt;img title="Sad because Facebook can't find this Ella song" src="http://img.skitch.com/20110926-et82r721t9jncqxan3k6n2b41m.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://developer.echonest.com/api/v4/song/search?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;results=1&amp;amp;artist=Ella%20Fitzgerald&amp;amp;title=The%20Sun%20Forgot%20to%20Shine%20This%20Morning&amp;amp;bucket=id:rdio-us-streaming"&gt;Of course, EN knows the song exists in Rdio, as does Rdio&amp;#8217;s API. Why doesn&amp;#8217;t Facebook?&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Artist pages and context&lt;/h3&gt;
&lt;p&gt;As you can probably tell, song resolving is bothering me enough; but the Facebook music application I was most excited about was the addition of content to its massive database of context. Facebook users or page owners can tag artist names in their wall posts and events and Facebook will helpfully make that artist playable if it knows about it:&lt;/p&gt;
&lt;p&gt;&lt;img src="http://img.skitch.com/20110927-b3mih9a15hu3hi2yx8akxy8s7n.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Here Facebook made a hover link for the band tapes ‘n tapes on a page for the venue Brighton Music Hall playable by choosing a random music service that can play songs by that artist for you. This is very similar to Google’s old “Music OneBox” which aggregated MySpace, iLike, LaLa and a lot of other websites you don’t use anymore. Great for listeners, (maybe) great for services, great for the bands. But here’s another area where ID resolution problems make the user experience fall down.&lt;/p&gt;
&lt;p&gt;It doesn&amp;#8217;t take long to find bands that Facebook simply doesn&amp;#8217;t &amp;#8220;know&amp;#8221; about, which is fascinating given the breadth and depth of their user entered and maintained &amp;#8220;community page&amp;#8221; and fan page structures. For example, one post down on the Brighton Music Hall page is a note about the great &lt;a href="http://developer.echonest.com/api/v4/artist/search?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;name=Dirty%20Beaches&amp;amp;results=1&amp;amp;bucket=biographies"&gt;Dirty Beaches&lt;/a&gt; (Echo Nest JSON):&lt;/p&gt;
&lt;p&gt;&lt;img title="But no player for Dirty Beaches, who are on Spotify" src="http://img.skitch.com/20110927-frq5aiyh79xqw1hbicp4mdtdim.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Where is the player? Spotify has a lot of Dirty Beaches, as do many of the services I tried. It appears that any relatively recent or independent band&lt;sup&gt;&lt;a href="#1"&gt;1&lt;/a&gt;&lt;/sup&gt; simply does not get the player, no matter what services can support playback. This is very sad for the musicians and or listeners like me.&lt;/p&gt;
&lt;p&gt;It’s clear and not particularly surprising that Facebook has trouble determining the identity of musicians on its own site— even those that have well groomed artist pages supplied by management with download widgets and tour details.&lt;/p&gt;
&lt;p&gt;One of the Echo Nest rites of passage is for an engineer to uncover details automatically of the classic rock group &amp;#8220;&lt;a href="http://en.wikipedia.org/wiki/The_Band"&gt;The Band&lt;/a&gt;&amp;#8221; from the internet&amp;#8212; and our recent push to &lt;a href="http://blog.echonest.com/post/6384161266/support-for-facebook-artist-ids"&gt;know all we can about Facebook artists and their pages&lt;/a&gt; was a two month 3 engineer effort that uncovered a cascading series of Facebook pitfalls in which &amp;#8220;The Band&amp;#8221; was actually an easy one. One Echo Nest employee wrote me a surreal late night email asking me to make sure he was still sane as the various Facebook data gathering APIs appeared to be non-deterministic: successive calls would return completely different results&lt;sup&gt;&lt;a href="#2"&gt;2&lt;/a&gt;&lt;/sup&gt;. I was grimly delighted to see that Facebook&amp;#8217;s own engineers faced the same problems we did. For some reason, it looks like we do a much better job at resolving Facebook page IDs. Again, this could definitely change as the Facebook service matures &amp;#8212; they got a lot done in a short amount of time and Facebook music just launched. Here&amp;#8217;s &lt;a href="http://developer.echonest.com/api/v4/artist/profile?api_key=N6E4NIOVYMTHNDM8J&amp;amp;id=ARGF9VF1187FB37DAE&amp;amp;bucket=id:facebook&amp;amp;format=json"&gt;our Facebook data about &amp;#8220;The Band&amp;#8221; returned from a simple search for &amp;#8220;The Band&amp;#8221; that links to the very professional and official:&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img title="The real The Band page on Facebook" src="http://img.skitch.com/20110926-f3c9xqi83ma7728ts8u5fhcetn.jpg" width="500"/&gt;&lt;/p&gt;
&lt;p&gt;Facebook seems to want to take you here, a twilight dead letter zone of people talking about &amp;#8220;the band&amp;#8221; in other contexts. It&amp;#8217;s not a page or a community page and therefore does not let cross-service resolving or context work. It looks like a SEO trap and seems to have conned 2,503 confused people:&lt;/p&gt;
&lt;p&gt;&lt;img title="The fake autogenerated hilarious Facebook The Band page" src="http://img.skitch.com/20111011-qb6me1ngxu8afiakp1hk7r7qcg.jpg" width="500"/&gt;&lt;/p&gt;
&lt;p&gt;You can see this in their music app whenever you see a band name all alone with no other information at the top. They have trouble with another favorite around here, &lt;a href="http://developer.echonest.com/api/v4/artist/search?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;name=the%20the&amp;amp;results=1&amp;amp;bucket=id:facebook"&gt;artist resolving 101 pop quiz post-punk 80s candidate&lt;/a&gt; The The&lt;sup&gt;&lt;a href="#3"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img title="Facebook does not know about The The." src="http://img.skitch.com/20110926-q5dt8jwcx6fh3eag9afsg3ncbc.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;These are not easy problems to solve.  A huge class of Facebook artist resolving issues seems to come down to &amp;#8220;merges&amp;#8221; &amp;#8212; artists that may be known as different names, aliases, nicknames, side projects or foreign language names. We maintain a huge database of musical aliases (&amp;#8220;Led Zep,&amp;#8221; &amp;#8220;BEP&amp;#8221; etc) as well as collaboration names and misspellings as &lt;a href="http://notes.variogr.am/post/6687194793/the-echo-nest-puddle-and-artist-entity-extraction"&gt;perfect resolving against text and search&lt;/a&gt; is something we work very hard on.  But Facebook doesn&amp;#8217;t like seeing &amp;#8220;Tom Jobim&amp;#8221; because they only know him as &amp;#8220;Antonio Carlos Jobim&amp;#8221;:&lt;/p&gt;
&lt;p&gt;&lt;img title="Tom Jobim is pretty popular but Facebook doesn't think so." src="http://img.skitch.com/20110926-x2pkfj33f9n3m7qpep9fscp96t.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;For example, any music service worth its salt has spent countless hours debating whether to assign Tom Petty and Tom Petty and the Heartbreakers the same database ID (the answer is no, by the way.)&lt;/p&gt;
&lt;p&gt;But when I listen to &amp;#8220;Free Fallin&amp;#8217;&amp;#8221; in Spotify (&lt;a href="http://en.wikipedia.org/wiki/Free_Fallin'"&gt;Petty solo&lt;/a&gt;) Spotify gets it right but I am not allowed to hear it in Rdio because it doesn&amp;#8217;t match up to their (incorrect) assignment of the song to the heartbreakers:&lt;/p&gt;
&lt;p&gt;&lt;img title="Spotify gets it right" src="http://img.skitch.com/20110926-khsr837nfg431qf8kkf1xks5bd.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;&lt;img title="Rdio, it's not Heartbreakers" src="http://img.skitch.com/20110926-dpxykkwcwh85abnufaymxxnxuj.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;And this is &amp;#8220;Free Fallin&amp;#8217;&amp;#8221;, a song that is taught to sixth graders and I am pretty sure is the state anthem of California. Same goes for other popular artists who&amp;#8217;ve performed with and without named backing groups:&lt;/p&gt;
&lt;p&gt;&lt;img title="The kind of page Facebook shows when it has no idea who this artist is. No context. And it's Prince." src="http://img.skitch.com/20110926-f9byngywbg6c72e54uqyy2anex.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;A real world example of this&lt;sup&gt;&lt;a href="#4"&gt;4&lt;/a&gt;&lt;/sup&gt;: a particular dear sensitive friend in London was having a late night Jason Molina bender broadcasted on his Facebook feed. This is an inspiring use of social music and where Facebook will eventually shine&amp;#8212; seeing what my friend is listening to, I can listen along and maybe tell him to put down the bottle of gin and go to sleep. However, I can&amp;#8217;t hear it:&lt;/p&gt;
&lt;p&gt;&lt;img title="Facebook not resolving a Jason Molina song" src="http://img.skitch.com/20110926-jmjf889xgj8bmr4y438sm8m4t2.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Why? Rdio had tons of Molina and of course that particular song. It&amp;#8217;s because Facebook didn&amp;#8217;t do a good job of resolving&lt;sup&gt;&lt;a href="#5"&gt;5&lt;/a&gt;&lt;/sup&gt; &amp;#8212; to them, Jason Molina is: &amp;#8220;Songs: Ohia,&amp;#8221; the name of one of his side projects. And of course neither Rdio nor any other service has a song called &amp;#8220;Get Out Get Out Get Out&amp;#8221; by Songs: Ohia, because it doesn&amp;#8217;t exist. This is a Jason solo song. Here&amp;#8217;s how Facebook got so confused:&lt;/p&gt;
&lt;p&gt;&lt;img title="Why Facebook won't resolve Molina: it auto-redirected him to his solo project Songs: Ohia" src="http://img.skitch.com/20110926-8pfx5n718f2khgc9dmm9m9s85a.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Facebook seems to rely strongly on Wikipedia for much of its artist data (their &amp;#8220;Community Pages&amp;#8221; are CC licensed WP copies), and Wikipedia&amp;#8217;s editors auto-redirect Songs:Ohia to Jason. So somewhere in the depths of Facebook&amp;#8217;s graph database is a pointer that goes the other way. This pretty much invalidates Jason&amp;#8217;s chances of ever getting social music love on Facebook. I doubt it&amp;#8217;ll get fixed any time soon. But maybe this one will:&lt;/p&gt;
&lt;p&gt;&lt;img title="Selena Gomez - Actor/Director" src="http://img.skitch.com/20110926-f4fyej48ywmaa18q7bppg4kaqg.jpg" width="500"/&gt;&lt;/p&gt;
&lt;p&gt;That&amp;#8217;s Selena Gomez. She&amp;#8217;s not Facebook Music compatible, for a different reason &amp;#8212; she has a double life. Selena is listed in Facebook as an &amp;#8220;Actor/Director,&amp;#8221; not a &amp;#8220;Musician/Artist.&amp;#8221; You can&amp;#8217;t click through from a friend&amp;#8217;s listen to a music service, and you can see her page from any stream activity. Can you guess why? It&amp;#8217;s  To Facebook, she doesn&amp;#8217;t make music. This affects a lot of edge-straddling pop stars, with some notable exceptions. I noticed that &amp;#8220;Glee Cast&amp;#8221; was manually fixed, as was Kraftwerk (who were a &amp;#8220;Local Business&amp;#8221; until a month ago&lt;sup&gt;&lt;a href="#6"&gt;6&lt;/a&gt;&lt;/sup&gt;) but comedians and musicals still are denied access to the social music party.&lt;/p&gt;
&lt;h3&gt;What could they do and what should happen&lt;/h3&gt;
&lt;p&gt;Quite a lot of Facebook&amp;#8217;s resolving issues could be fixed by ingesting catalog in a &amp;#8220;musical&amp;#8221; way &amp;#8212; not just treating strings such as artist names or song titles as database IDs as they seem to be doing. There are some pretty well-known approaches Facebook could take to fix these problems. They can use audio fingerprinting, for example (&lt;a href="http://developer.echonest.com/docs/v4/song.html#identify"&gt;of course, I know of a couple&lt;/a&gt;, even an &lt;a href="http://echoprint.me"&gt;open source one.&lt;/a&gt;) They can work on mapping artist names and song titles together a bit more intelligently as we&amp;#8217;ve been doing for awhile: one of The Echo Nest&amp;#8217;s main services is &lt;a href="http://musicmachinery.com/2010/02/10/introducing-project-rosetta-stone/"&gt;project Rosetta Stone&lt;/a&gt;, an &amp;#8220;ID space&amp;#8221; resolution system that can quickly identify songs or artists in any platform: generate an eMusic ID from a Spotify URL, or a MOG ID from an audio fingerprint, or any combination possible. Facebook could have merged the data feeds in some slightly intelligent way to match songs across releases. Or some of their millions of users could edit or automatically de-reference the metadata&lt;sup&gt;&lt;a href="#7"&gt;7&lt;/a&gt;&lt;/sup&gt;. It&amp;#8217;s clear that none of this happened.&lt;/p&gt;
&lt;p&gt;Social music is the future of music.  Facebook is pushing this future forward more than anyone.  It’s clear that Facebook has some trouble ahead of them in the resolving space and I’m sure they are obsessing about it as much as I do.  They are going to have to get down with the &lt;a href="http://www.wired.com/epicenter/2009/12/4-ways-one-big-database-would-help-music-fans-industry/"&gt;one big database of music&lt;/a&gt; scenario sooner than later. Facebook has more users and data than anyone and I’d love to see a concerted effort to build a true world of connected music via the new Open Graph. But there’s a lot more work to do before that promise is realized.&lt;/p&gt;
&lt;p&gt;- &lt;a href="mailto:brian@echonest.com"&gt;brian@echonest.com&lt;/a&gt; / &lt;a href="http://twitter.com/#!/bwhitman"&gt;@bwhitman&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks to EVB and PBL and JL and MO for editing help&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="0" id="0"&gt;&lt;sup&gt;0&lt;/sup&gt;&lt;/a&gt; Spotify actually had the nerve to call it &amp;#8220;scrobbling,&amp;#8221; &lt;a href="http://twitter.com/bwhitman/status/118347586217320449"&gt;which I doubt the inventor of the scrobble, Last.fm, are very happy about&lt;/a&gt; as Facebook&amp;#8217;s integration is a clear competitor &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="1" id="1"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; I can&amp;#8217;t prove it yet, but I think this feature may be limited to artists appearing on an older dump of MusicBrainz&amp;#8212; which is too bad as they only have 600,000 artists and are relatively slow to update &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="2" id="2"&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; &lt;a href="https://graph.facebook.com/search?q=The+The&amp;amp;type=page&amp;amp;limit=10"&gt;Try this (failing) query for The The.&lt;/a&gt; You&amp;#8217;ll either get The Bible or the Simpsons as the top result, which I found very apropos. Maybe the engs pit memcache servers against each other &lt;a href="http://achewood.com/index.php?date=03042004"&gt;&amp;#8220;I don&amp;#8217;t like fighting either. Get here first&amp;#8221;&lt;/a&gt; style &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="3" id="3"&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; I&amp;#8217;ve heard this is a Google interview question and is still to this day one of the first bands I type into a new music service after &lt;a href="http://en.wikipedia.org/wiki/Keith_Fullerton_Whitman"&gt;Blitter&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Monolake"&gt;Various Artists.&lt;/a&gt; &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="4" id="4"&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; as no one really needs help listening to Tom Petty, it is somewhat an inherent instinctual property of the world that Tom Petty vibrates speakers &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="5" id="5"&gt;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; You can see Facebook&amp;#8217;s resolved name if you click on a song title in Facebook Music after hovering over a music listen status &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="6" id="6"&gt;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt; they surely have a fan in Facebook engineering &lt;/small&gt;&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;a name="7" id="7"&gt;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; It&amp;#8217;s likely this may be happening to some extent. If I play a song in Rdio that Facebook did not previously &amp;#8220;know&amp;#8221; about without Facebook&amp;#8217;s help the next time I try it in Facebook it appears as an option. I&amp;#8217;m not sure if this persists to other users. &lt;/small&gt;&lt;/p&gt;</description><link>http://notes.variogr.am/post/10733372290</link><guid>http://notes.variogr.am/post/10733372290</guid><pubDate>Tue, 11 Oct 2011 08:57:00 -0400</pubDate></item><item><title>The Echo Nest "puddle" and artist entity extraction</title><description>&lt;p&gt;(Cross posted from my post at the Echo Nest blog, with additions)&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s the year 2006 and co-Founder Brian and early Nest developer &lt;a href="http://www.squid-labs.com/people/ryan.html"&gt;Ryan&lt;/a&gt; were trying to figure out how to associate the world of free text on the internet to musical artists. We already were crawling tens of thousands of documents a day (now millions!) but a Google-style index of unstructured text about music was not our goal. We needed to somehow quickly associate a new incoming page to an artist ID so that we could quickly retrieve all the documents about an artist as well as run our statistics on the text to find out what people were saying. Brian sketched this classic diagram, soon to be placed in the Echo Nest Museum (next to &lt;a href="http://twitter.com/#!/David/status/78655783851667456"&gt;SoundCloud&amp;#8217;s award&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src="http://static.echonest.com/puddle.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;That crudely drawn &amp;#8220;blob of intelligence&amp;#8221; that could take unstructured free text and quickly identify artist names quickly became known as &amp;#8220;the Puddle,&amp;#8221; a term that entered Echo Nest lore alongside &amp;#8220;grankle&amp;#8221; and &amp;#8220;flat.&amp;#8221; We use a form of the Puddle to this day. Every piece of text that our crawlers generate goes through a custom entity extraction process&amp;#8212; it&amp;#8217;s how we know &lt;a href="http://developer.echonest.com/docs/v4/artist.html#blogs"&gt;what blogs are writing about which artists&lt;/a&gt; and it&amp;#8217;s what powers our artist similarity engine, as we need to figure out what people are saying about which artists as soon as it&amp;#8217;s said. It&amp;#8217;s a powerful and fast changing piece of our infrastructure trying to attack a hard problem.&lt;/p&gt;
&lt;p&gt;Entity extraction is even more useful today. If you wanted to build a Twitter app that figured out the bands a user was talking about, how could you do it? You&amp;#8217;d need a huge database of artists (check, we have over 1.6 million), a lot of fast computers (check), and tons of rules learned from our customers over the years about artist resolution&amp;#8212; aliases, stopwords, tokenization, merged artists and so on. Given a simple &lt;a href="http://twitter.com/#!/bwhitman/status/79156411891843072"&gt;tweet&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://twitter.com/#!/bwhitman/status/79156411891843072"&gt;&lt;img src="http://static.echonest.com/hefner.png"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Can we figure out what band Brian&amp;#8217;s talking about, automatically? Well, &lt;a href="http://developer.echonest.com/api/v4/artist/extract?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;text=someday%20we're%20going%20to%20realize%20how%20great%20Hefner%20was.%20There'll%20be%20a%20parade%20and%20every%20top%2040%20hit%20grows%20a%20coda%20that%20year"&gt;now you can.&lt;/a&gt; We&amp;#8217;ve decided to open up a &lt;a href="http://developer.echonest.com/docs/v4/artist.html#extract"&gt;beta version of our entity extraction toolkit called &lt;strong&gt;artist/extract&lt;/strong&gt; to developers&lt;/a&gt;. Pass in any text and you&amp;#8217;ll get back a list of artist names (in order of appearance by default, but you can sort by any Echo Nest feature) that was mentioned in the text. Think of it as a form of artist search that can take anything &amp;#8212; Facebook comments, tweets, blog posts, reviews, SMSes.&lt;/p&gt;
&lt;p&gt;We support all sorts of fancy things to help you. &lt;a href="http://developer.echonest.com/api/v4/artist/extract?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;text=led%20zep%20is%20so%20much%20better%20than%20The%20Killers&amp;amp;results=15&amp;amp;bucket=familiarity"&gt;We know that &amp;#8220;Led Zep&amp;#8221; is an alias for Led Zeppelin&lt;/a&gt;. We try to deal with &lt;a href="http://developer.echonest.com/api/v4/artist/extract?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;text=I'm%20walking%20on%20air&amp;amp;results=15"&gt;common word band names&lt;/a&gt; via capitalization rules. You can of course detect &lt;a href="http://developer.echonest.com/api/v4/artist/extract?api_key=N6E4NIOVYMTHNDM8J&amp;amp;format=json&amp;amp;text=isn't%20Hrvatski%20the%20same%20guy%20as%20keith%20fullerton%20whitman?"&gt;multiple artists in the same block of text.&lt;/a&gt; And you can use &lt;a href="http://developer.echonest.com/docs/v4/catalog.html"&gt;personal catalogs&lt;/a&gt; and &lt;a href="http://developer.echonest.com/docs/v4/index.html#project-rosetta-stone"&gt;Rosetta Stone&lt;/a&gt; to limit results to music your user owns or is playable by our partners Rdio or 7digital. And you can add the standard buckets &amp;#8212; hotttnesss, familiarity and so on &amp;#8212; to get information about the artists all in one call.&lt;/p&gt;
&lt;p&gt;This is beta and still has some issues. It lags a little behind our internal entity resolving for performance reasons, and things like this can never be perfect. But it&amp;#8217;s very helpful. Some ideas we have batting around:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Suggest bands to Facebook users using our new &lt;a href="http://blog.echonest.com/post/6384161266/support-for-facebook-artist-ids"&gt;Facebook Rosetta service&lt;/a&gt; and by parsing their comments for band names &lt;/li&gt;
&lt;li&gt;Recommend Twitter followers based on the music they talk about &lt;/li&gt;
&lt;li&gt;Play a radio stream for any blog using our &lt;a href="http://developer.echonest.com/docs/v4/playlist.html#static"&gt;playlist APIs&lt;/a&gt; by parsing their posts for artist names &lt;/li&gt;
&lt;li&gt;See how &amp;#8220;indie&amp;#8221; your friends are by computing average hotttnesss of all the bands they mention in email &lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Paul made a great demo if you want to see it in action:
&lt;/p&gt;&lt;p&gt;&lt;a href="http://static.echonest.com/visualizations/years-active/find-the-artist.html"&gt;&lt;img src="http://musicmachinery.files.wordpress.com/2011/06/find-the-artists.png?w=620&amp;amp;h=502"/&gt;&lt;/a&gt;
&lt;/p&gt;&lt;p&gt;Enjoy. &lt;a href="http://developer.echonest.com/forums"&gt;Let us know&lt;/a&gt; if you have any issues!&lt;/p&gt;</description><link>http://notes.variogr.am/post/6687194793</link><guid>http://notes.variogr.am/post/6687194793</guid><pubDate>Sun, 19 Jun 2011 09:17:39 -0400</pubDate></item><item><title>The future music platform &amp; music startups' imminent success</title><description>&lt;p&gt;&lt;img src="http://static.echonest.com/b/platform.jpg" width="300"/&gt;&lt;/p&gt;
&lt;p&gt;Right as I got on the plane to Amsterdam to attend the great &lt;a href="http://musicandbits.com"&gt;Music and Bits&lt;/a&gt; conference that headlines the Amsterdam Dance Event, I was sent this article that I am sure a lot you have already read: &lt;a href="http://mashable.com/2010/10/17/imeem-music-startups/"&gt;Are music startups destined to fail?&lt;/a&gt; Being the co-founder and CTO of a music startup that is &lt;a href="http://venturebeat.com/2010/10/05/echo-nest-funding/"&gt;not failing anytime soon&lt;/a&gt;, I felt the urge to respond in some meaningful way. Thankfully I was about to get on stage in front of the best and brightest of Europe&amp;#8217;s music people, so I took the opportunity to work Dalton&amp;#8217;s warning into my talk.&lt;/p&gt;
&lt;p&gt;Let me first correct Mashable &amp;#8212; your headline is terrible. Amazingly, and I think this is the first and last time I&amp;#8217;ll defend these guys but when &lt;a href="http://techcrunch.com/2010/10/20/imeem-founder-dalton-caldwells-must-see-talk-on-the-challenges-facing-music-startups/%20"&gt;Techcrunch is more measured than you&lt;/a&gt; you may want to reconsider your editorial policies. &lt;a href="http://news.ycombinator.com/item?id=1817921"&gt;Dalton said nothing about mass failure.&lt;/a&gt; Echo Nest was one of their first API clients and we were about to work together before they went away, we all give each other big hugs and air kisses. He was not trying to predict my particular imminent doom. However, he was discussing something that is very hard for a lot of people and it is &lt;strong&gt;Echo Nest&amp;#8217;s business to make this easy and excellent.&lt;/strong&gt; So I made my talk about that &amp;#8212; there is this thing happening, that I call &amp;#8220;the music platform,&amp;#8221; that yes, we&amp;#8217;re working on but so are a lot of other guys, and if you are thinking of a great music experience &lt;strong&gt;don&amp;#8217;t let these guys put you off&lt;/strong&gt;. We are here to help.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a title="The future music platform" href="http://www.slideshare.net/bwhitman/the-future-music-platform-5529598"&gt;The future music platform&lt;/a&gt;&lt;/strong&gt; &lt;br/&gt;&lt;object height="355" width="425" id="__sse5529598"&gt;
&lt;param value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=mnb2010-101022081429-phpapp01&amp;amp;stripped_title=the-future-music-platform-5529598&amp;amp;userName=bwhitman" name="movie"&gt;&lt;param value="true" name="allowFullScreen"&gt;&lt;param value="always" name="allowScriptAccess"&gt;&lt;embed height="355" width="425" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=mnb2010-101022081429-phpapp01&amp;amp;stripped_title=the-future-music-platform-5529598&amp;amp;userName=bwhitman" name="__sse5529598"&gt;&lt;/embed&gt;&lt;/object&gt;
&lt;/p&gt;
&lt;h2&gt;What is a music platform?&lt;/h2&gt;
&lt;p&gt;First, we all agree that &lt;a href="http://www.billboard.biz/bbbiz/content_display/industry/e3i050e81f63a15745d080103ddf80c6c3b"&gt;the future of the music business is apps.&lt;/a&gt; This is not just a party line EN takes; it is a real thing that is happening. In the past couple of years your choices as a music listener have gone up exponentially. You can hear a song on any number of mobile applications, web sites or downloadable programs. It doesn&amp;#8217;t take a &amp;#8220;music futurist&amp;#8221; to point at the movement from digital downloads to interactive applications&amp;#8212; Pandora, Spotify, Guitar Hero. Shazam &amp;amp; Soundhound, Hype Machine, &lt;a href="http://shuffler.fm"&gt;Shuffler&lt;/a&gt;. Even the killer iPad game, Osmos, is really an enhanced delivery mechanism for &lt;a href="http://createdigitalmusic.com/2010/03/01/exclusive-free-soundtrack-osmos-featuring-gas-julien-neto-loscil-high-skies/"&gt;Loscil&lt;/a&gt; and labelmates. There is someone now making a thing somewhere that will destroy the iTunes interface (finally) for browsing. There are a few guys pumped after a &lt;a href="http://musichackday.org"&gt;Music Hack Day&lt;/a&gt; that just filed papers in Delaware for a new radio app. And, yes, Google and Apple are about to throw their gilded monocles, canes and top hats into the ring any day now.&lt;/p&gt;
&lt;p&gt;So we&amp;#8217;ve identified three very hard things in getting these apps out there. These three put together comprise a music platform&amp;#8212; someone to help you take care of all the annoying stuff while you concentrate on the experience.&lt;/p&gt;
&lt;p&gt;&lt;img width="400" src="http://static.echonest.com/b/plat.025.png"/&gt;&lt;/p&gt;
&lt;p&gt;The first one is our bread and butter, what we call &amp;#8220;engineering.&amp;#8221; &lt;a href="http://developer.echonest.com"&gt;The Echo Nest&lt;/a&gt; has been doing this for years &amp;#8212; providing an outsourced music database to companies and independent devs &amp;#8212; and we&amp;#8217;ve gotten quite good at it. We have a database of over 10 million unique songs and 1.2 million artists, each with insane amounts of metadata. We know the pitch of the third sound of the first b-side track on &lt;a href="http://www.factmag.com/2010/05/27/james-blake-cmyk-ep/"&gt;James Blake&amp;#8217;s &amp;#8220;CMYK.&amp;#8221;&lt;/a&gt; We know how many people called Spoon&amp;#8217;s latest record &amp;#8220;angular,&amp;#8221; and what web sites they said it on. We can give you the &lt;a href="http://blog.developer.echonest.com/"&gt;tempo of any song, similar artists, loudness, a recommender&lt;/a&gt; and new stuff like an excellent music fingerprinter, or the stuff needed to &lt;a href="http://evolver.fm/2010/10/20/some-concerts-energize-you-others-destroy-your-soul/"&gt;choose concerts based on the danceability curve of the setlists&lt;/a&gt;. You no longer need to worry about &amp;#8220;big data&amp;#8221; or figuring out why Tom Petty is not the same band as Tom Petty &amp;amp; the Heartbreakers (ps &lt;a href="http://the.echonest.com/company/jobs"&gt;work here if that pisses you off&lt;/a&gt;.) Other guys can help you with this too &amp;#8212; the amazing &lt;a href="http://musicbrainz.org"&gt;Musicbrainz&lt;/a&gt;, of course &lt;a href="http://last.fm"&gt;last.fm&lt;/a&gt; and some other upstarts.&lt;/p&gt;
&lt;p&gt;The second is audience&amp;#8212; with your new experience, how can you get out to listeners? Fortunately there are very large companies competing for developers&amp;#8217; attention. At the rate we seem to be headed there will be an app store for combination microwave oven/toasters. But in the short term, audience from a mobile app store, social network is here now and viable. But we&amp;#8217;d like to think it goes deeper than that&amp;#8212; and by working with the content providers (more on that soon), we&amp;#8217;ll be helping app developers get in front of fans with direct artist marketing.&lt;/p&gt;
&lt;p&gt;The last, and trickiest, is content. As a music app developer, until about two months ago there was no easy way to get &lt;em&gt;actual music&lt;/em&gt; to play without negotiating your own content deals. This is the last thing a dev wants to work on and we&amp;#8217;ve made it our goal to help these guys out. This was the crux of Dalton&amp;#8217;s pain, and from now on it may be helpful to consider the Echo Nest as a &amp;#8220;proxy for the music industry.&amp;#8221; We&amp;#8217;ve already announced one major label deal to allow content, and we currently have &lt;a href="http://createdigitalmusic.com/2010/09/20/when-data-and-music-meet/"&gt;two deals with digital distribution channels.&lt;/a&gt; With these deals in place, it takes an amazingly short amount of time to build your future vision of music. Look at this app, made in 24 hours&amp;#8212; it is a Pandora style streaming radio app but has 10 million songs available. It uses only the Echo Nest API for all of its features. Annoying things like &lt;a href="http://developer.echonest.com/docs/v4/playlist.html"&gt;DMCA rules are built into our API&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
&lt;object height="385" width="480"&gt;
&lt;param value="http://www.youtube.com/v/xkgcoxhT7cg?fs=1&amp;amp;hl=en_US" name="movie"&gt;&lt;param value="true" name="allowFullScreen"&gt;&lt;param value="always" name="allowscriptaccess"&gt;&lt;embed height="385" width="480" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash" src="http://www.youtube.com/v/xkgcoxhT7cg?fs=1&amp;amp;hl=en_US"&gt;&lt;/embed&gt;&lt;/object&gt;
&lt;/p&gt;
&lt;h2&gt;Why it is awesome&lt;/h2&gt;
&lt;p&gt;I want an iPhone app that does &lt;a href="http://music.joshmillard.com/2010/06/04/nine-inch-niles-the-seattleward-spiral/"&gt;Josh Millard&amp;#8217;s Frasier fever dream&lt;/a&gt; on Prisoner episodes. I want a text message with a link to streaming audio whenever &lt;a href="http://www.ilxor.com/ILX/NewAnswersControllerServlet?boardid=41"&gt;ILM flips out&lt;/a&gt; over a new artist. And I want imeem back, they were awesome. And with everything we&amp;#8217;re trying I think it&amp;#8217;s finally possible&amp;#8212; the way we are experiencing music is changing and we want to work with brilliant people to make it change even faster. It&amp;#8217;s not perfect yet, but we&amp;#8217;re on the right track and I don&amp;#8217;t think you&amp;#8217;ll find a more focused and insane group of people in the world trying to make this work. If you&amp;#8217;ve got ideas or questions, &lt;a href="mailto:brian@echonest.com"&gt;write me a mail&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img width="500" src="http://static.echonest.com/b/awesome.png"/&gt;&lt;/p&gt;
&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;&lt;script src="http://b.scorecardresearch.com/beacon.js?c1=7&amp;amp;c2=7400849&amp;amp;c3=1&amp;amp;c4=&amp;amp;c5=&amp;amp;c6="&gt;&lt;/script&gt;</description><link>http://notes.variogr.am/post/1373556723</link><guid>http://notes.variogr.am/post/1373556723</guid><pubDate>Fri, 22 Oct 2010 10:21:00 -0400</pubDate></item><item><title>The Echo Nest Musical Fingerprint (ENMFP)</title><description>Tomorrow begins &lt;a href="http://amsterdam.musichackday.org"&gt;MHD Amsterdam&lt;/a&gt; and at it The Echo Nest is releasing a few new things. Some of our engineering team (who deserve a severe callout for all their work, let me stick with their &lt;a href="#codenames"&gt;codenames&lt;/a&gt;) have been working tirelessly to get &amp;#8220;songs&amp;#8221; to be a first-class member of our API, and as of today, &lt;a href="http://beta.developer.echonest.com"&gt;they are&lt;/a&gt; &amp;#8212; we now track many millions of songs and you can query for them by name and receive all sort of useful metadata, get similar songs (with amazing results even very deep in the catalog), and even get free (legal) playable audio for a huge collection of major label content (more on this later.) As part of this push to provide data about songs, we have been working on a music fingerprint&amp;#8212; a way to resolve an unknown audio file (what we call a &amp;#8220;track&amp;#8221;) to a large database to identify it in our world (as a &amp;#8220;song.&amp;#8221;) And we&amp;#8217;re ready to release this to the community to see how it performs in the wild.

&lt;p&gt;The design goals of our FP were to base it on Echo Nest audio features, to make it simple to implement and to make it as open as possible. Lock in of content resolution data is a terrible thing, and a large part of The Echo Nest&amp;#8217;s focus is to make it easy for people to figure out what their music is about without &lt;a href="http://musicmachinery.com/2010/02/10/introducing-project-rosetta-stone/"&gt;getting stuck in ID space hell&lt;/a&gt;. If you have an iTunes collection and want to automatically make Spotify playlists, we should be able to help you. If you write an app that scans your hard drive for tracks to make great recommendations against MOG or the Limewire store, we should be able to help you. If you want the tempo of every song in someone&amp;#8217;s terribly labeled iPod library, we should be able to help you. A fingerprint to us is a utility call&amp;#8212; like our &lt;a href="http://beta.developer.echonest.com/artist.html#search"&gt;search_artists&lt;/a&gt; &amp;#8212; a way to resolve a music identifier to our set of ID spaces. Echo Nest song IDs, if you choose to use them, give you all of our stuff &amp;#8220;for free&amp;#8221; &amp;#8212; from a single EN SO ID you can get recommendations, artist pictures and bios, blog posts, record reviews, and of course all the audio analysis: the tempo, key, events in the song. But over this year we are rolling in support for any other ID space via &lt;a href="http://musicmachinery.com/2010/02/10/introducing-project-rosetta-stone/"&gt;Rosetta Stone&lt;/a&gt;, so you will be able to return Spotify IDs or get last.fm URLs of the song from the fingerprint. Our goal as always is to be the bridge between music and amazing applications&amp;#8212; a platform for music intelligence that lets anyone use any service on any audio to discover and interact with music.

&lt;/p&gt;&lt;h3&gt;How it works&lt;/h3&gt;
&lt;p&gt;&lt;img src="http://static.echonest.com/b/features.png" width="400"/&gt;&lt;/p&gt;

&lt;p&gt;Our fingerprint is called the Echo Nest Musical Fingerprint (ENMFP) and is based directly on parts of our &lt;a href="http://the.echonest.com/platform/how-it-works/"&gt;audio analysis engine&lt;/a&gt; that already powers tons of interactive music and music search apps across the globe. We get a detailed understanding of what is happening in a song (note: a song, not just an audio file) for &amp;#8220;free&amp;#8221; simply by having &lt;a href="http://www.media.mit.edu/~tristan"&gt;Tristan&lt;/a&gt; be our co-founder, so our work on the ENMFP started there. We worked with audio scientists on ways to scalably hash parts of the analysis and query for &amp;#8220;codes&amp;#8221; &amp;#8212; a sequence of numbers that can match the same song to the ear. We identified an efficient series of transformations of our low level segment description data to make a very accurate code, and our engineering team built a suite of tests, backend servers, and a query API. The ENMFP comes in two parts. The &lt;b&gt;code generator&lt;/b&gt; is a binary library that you can compile into your own app. It takes in a buffer of PCM samples (in practice, give it around 20 seconds of 22050Hz mono float PCM), runs a series of signal processing algorithms on the samples, and returns a list of codes. It is as simple as

&lt;/p&gt;&lt;pre class="prettyprint"&gt;
    Codegen * pCodegen = new Codegen(_pSamples, _NumberSamples, offset);
    for (uint i=0;i&amp;lt;pCodegen-&amp;gt;getNumCodes();i++)
        printf("%ld ", pCodegen-&amp;gt;getCodes()[i]);
&lt;/pre&gt;

The &lt;b&gt;server&lt;/b&gt; maintains a canonical list of songs with corresponding codes and performs fast lookup. We&amp;#8217;ve based the server on some popular open source indexing and storage platforms, and we&amp;#8217;ll be releasing our modifications to them as a reference implementation shortly.

&lt;h3&gt;Use and open nature&lt;/h3&gt;

&lt;p&gt;Almost all of this implementation is open. The data behind the server is open by design. Anyone can request full data dumps. Anyone that wants to run their own server can provided that they mirror with the other servers. The only non-normative license is in the code generator, which for now is binary-only, available for most platforms (Windows, Linux 32 &amp;amp; 64-bit, Mac OS X, mobile forthcoming) and free to use in any sort of application &amp;#8212; commercial, open source, free, webapp, etc. The only pertinent restriction is that codes are sent to only &amp;#8220;authorized servers.&amp;#8221; The design of this license ensures that one party does not attempt to usurp the ID resolving space out from under anyone. If The Echo Nest dissolves or gets bought by a large fish cannery on accident, we want to make sure the data and query service live on without us. As a corollary, we don&amp;#8217;t want anyone &amp;#8220;hiding&amp;#8221; new resolved tracks from the ID space. Anyone that collects new songs via this fingerprint has to share their data, plain and simple. This hopefully ensures that over the years the &lt;i&gt;combined knowledge from all uses of the ENMFP will catalog every single piece of music available on the internet, and the data will be available to all.&lt;/i&gt; We want the ENMFP to grow into a public internet utility.

&lt;/p&gt;&lt;p&gt;&lt;img src="http://static.echonest.com/b/enmfp.png"/&gt;&lt;/p&gt;

&lt;h3&gt;Features&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;The ENMFP looks at the underlying music, not just the raw audio signal. This gives it some unique advantages:
&lt;ul&gt;&lt;li&gt;Unlike many FPs, is robust to time scaling
&lt;/li&gt;&lt;li&gt;Can identify sample use in mixed audio
&lt;/li&gt;&lt;li&gt;Can identify remixes, live versions and sometimes cover versions 
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Can identify a song in &amp;lt;20s of audio
&lt;/li&gt;&lt;li&gt;Can also match on track metadata (artist name, title, length) using Echo Nest name matching in the same call
&lt;/li&gt;&lt;li&gt;Server and some of the code generator are completely open source
&lt;/li&gt;&lt;li&gt;Data is completely open; dumps provided, mirroring required to host your own server (we want people to boot their own copies of the data)
&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Anti-features&lt;/h3&gt;
&lt;ul&gt;&lt;li&gt;In heavy alpha, not heavily QA&amp;#8217;d yet, &lt;a href="http://the.echonest.com/jobs.html"&gt;help wanted&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;Not completely OSS: the code generator relies on proprietary EN algorithms. Binaries provided, free to use, but not open source.
&lt;/li&gt;&lt;li&gt;No ingestion API yet (you are querying against a large but not complete catalog, there is no way currently to add new songs. This is changing soon. If you maintain a large catalog and want it in our reference database, &lt;a href="mailto:enmfp@echonest.com"&gt;please get in touch.&lt;/a&gt;)
&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;How to use&lt;/h3&gt;

&lt;p&gt;First, you need an Echo Nest &lt;a href="http://developer.echonest.com"&gt;developer API key&lt;/a&gt; if you don&amp;#8217;t already have one. Next, familiarize yourself with the &lt;a href="http://beta.developer.echonest.com/song.html#alpha-identify-song"&gt;alpha_identify_song&lt;/a&gt; API. (As of right now, before we release the server source, the Echo Nest is hosting the only query server via this API.) There is instructions there on how to receive the libcodegen binaries. The libcodegen package also ships with an example code generator that you can call from the commandline, so no worries if you aren&amp;#8217;t ready to do some compiling. 

&lt;/p&gt;&lt;h3&gt;How to help&lt;/h3&gt;
&lt;p&gt;We see the ENMFP as a community project just getting started. If you are interested in booting your own mirror server, or if you have experience with FP tasks, want to help with QA, automated testing, have a large catalog to ingest or test against, please &lt;a href="mailto:enmfp@echonest.com"&gt;get in touch.&lt;/a&gt;

&lt;/p&gt;&lt;hr&gt;&lt;a name="codenames"&gt; &lt;/a&gt;
&lt;font size="-1"&gt;we are especially grateful for the work of Unrepentant Nagios Installer (UNI), Guy Who Fights With Me About the Word &amp;#8220;Track&amp;#8221; Every Fucking Day (GWFWMAWTEFD), Drinks Turret Coolant (DTC), Mr. HTML5 Canvas 2010 (HC2), So-Glad-I-Kept-You-Out-Of-The-Media-Lab (SGIKYOOTML), Skinny Tie (ST), Main Ontology Offender (MOO), Future Performable Employee (FPE), and of course Ben Lacker (BL) &lt;/font&gt;</description><link>http://notes.variogr.am/post/544559482</link><guid>http://notes.variogr.am/post/544559482</guid><pubDate>Fri, 23 Apr 2010 23:10:52 -0400</pubDate></item><item><title>Video</title><description>&lt;iframe width="400" height="240" src="http://www.youtube.com/embed/T20-KcCGokU?wmode=transparent&amp;autohide=1&amp;egm=0&amp;hd=1&amp;iv_load_policy=3&amp;modestbranding=1&amp;rel=0&amp;showinfo=0&amp;showsearch=0" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;&lt;br/&gt;&lt;br/&gt;</description><link>http://notes.variogr.am/post/376901348</link><guid>http://notes.variogr.am/post/376901348</guid><pubDate>Sun, 07 Feb 2010 18:18:59 -0500</pubDate></item></channel></rss>
