Unemployment Diaries: WiFi ePaper display

IMG 2110

As you may have heard I’ve been having some on-purpose downtime to catch up a bit on some personal projects. My time off has been a bit more existential than I initially planned on:

But it’s been going well, thank you for asking! I’ve been keeping busy / distracted by finally clearing out a long stack of personal projects, none with any commercial potential. I’ve noticed something interesting — when others use the word “hobbies” to describe what I’m up to, I recoil in fear, but I simultaneously downplay everything I work on in my studio in Greenpoint as temporary and “just for fun.” I should probably just own what I’m doing a bit more: it is OK to do things that won’t end up changing the world. I see it as catch-up education after not building something I could touch for many, many years. 

I put together a nice WiFi enabled ePaper / e-Ink display to hang in the kitchen and show @cookbook recipes to inspire us to try different recipes. It was surprisingly fun and easy. I wrote up a HOWTO guide on GitHub, with all the connections, code and equipment you need (not much, and you can buy almost all of it on Sparkfun or Amazon.) It updates once an hour and picks a random tweet from a web service hosted on Google App Engine, but you can have it do whatever you’d like. I learned a lot about the low power mode of the ESP8266, and that combined with the EPD you’ve got a screen that can last years hanging on a wall with a 2000mAh battery. 

To be honest, I started down this path because I want an actual computer with an e-Ink display. I was hoping for a “FreeWrite” style keyboard & EPD combo with a larger work area and more functionality. I ended up buying embarrassingly large amounts of random EPD development boards off of eBay and DigiKey, and found that none yet are big enough or have good enough refresh rate to be slightly useable for a text-based interactive computer display. You can attempt to hack up a recent Kindle or Nook but it’s terribly fiddly. The Pervasive Display kits are close, as the newer models allow sub-region updates (you can tell the EPD to only update a bounding box instead of the whole thing, good for interactive text editing / display), but not big enough and geared towards Raspberry Pi “shields,” with fiddly I2C or SPI wiring arranged for their 40 pin headers. The board I used for this uses a simple serial protocol that anything can control, but takes almost a second to update the screen. So I toned down my ambitious design and just made a nice object. More of that to come.

IMG 2104

“Build your own Echo?” The ReSpeaker mic array

Like so many of you, I visited family over the holidays to find their home riddled with talkative but useful voice control devices from Google and Amazon. It’s still striking to me how good the speech recognition layer has gotten, especially the acoustic interface. Sure, all the underlying neural networks cutting HMMs off at the knees humming along on banks of GPUs are very interesting, but do you even know how many microphones the Amazon Echo has? Seven! There’s a lot of DSP that happens before the audio gets to Amazon’s servers: beamforming, de-reverberation / echo removal, noise canceling, voice activity detection, localization. The magic that lets the Echo talk with you from across the room owes as much to acoustic & DSP engineering as “machine learning.” And those microphones have a lot to do with it.

IMG 1883

 

I’ve given a lot of thought to voice & natural language interfaces over the years and multi-microphone voice activity detection (VAD) in particular; I’ve been working on a related hardware project that I hope I can describe soon. In the meantime, I thought I’d see what I can learn about the state of the art in multi-microphone consumer hardware. It’s sadly a bit too hard to crack open an Echo or Home and fiddle with the individual microphones or the DSP processor. But a few months ago, the ReSpeaker Kickstarter caught my attention. I’m innately suspicious of most crowd-funded electronics but the company behind it, Seeed, has shipped before. I knew I’d get something relatively on time, even if it’d be a just-out-of-alpha board with no documentation or useful code. And I did: although delivery was promised for November, I got the package from Shenzhen just last week.

The ReSpeaker package is two things: a MediaTek WiFi MIPS processor running Linux, based on their MT7688, combined with the more interesting ReSpeaker Mic Array that optionally fits on top. The mic array appears to be an almost perfect clone of the Echo microphone module, with the same number of microphones arranged in the same manner. It’s powered by an XMOS xCORE chip. XMOS makes their own dev boards (also shaped like the Echo board) and they go for $500-1500, so for only $79 once available in February, you’ve got a relatively workable 7 microphone far field system to do your bidding. I hope.

The first thing I did with the ReSpeaker system is try to find any documentation. There really isn’t much — a very spartan landing page, and a ill-attended forum. I attached the mic array board to the top of the MediaTek board, fingers crossed I had the right orientation, plugged in the micro USB cable to my mac, and watched some LEDs spin around. Very pretty! But I wanted to dig in a little more than that. I noted my Mac gained a new serial port, so I blindly tried to connect to it using screen at 115200 bps, and got this magical login screen:

7SxXhN1

Looks like the MediaTek board is running OpenWrt. Poking around, you can see that the mic array is attached over USB, and there is a python library for getting audio from the board as well as control over the LEDs over USB HID. The Python library led me to an in-progress but official getting started page, so I took the time to set up the WiFi on the board and try out some of the examples. 

It’s clear that the ReSpeaker as sold is not going to fully replace an Echo or Google Home device for you. You can run a speech recognition (ASR) kit (PocketSphinx) on the MIPS chips of the MediaTek, but it’s very slow with obvious buffer under-runs. The way the team at Seeed would like this used is for the on-chip ASR to perform only “wake word” processing — listening for a short phrase only, then passing along actual ASR duties to a 3rd party remote API like Bing / Cortana, Google Voice, Amazon, etc. This is a reasonable request and in line with how the other hardware devices work, but if you were hoping for an “offline Echo,” this board will not help you. I was able to get wake word processing running using their Python library & PocketSphinx, and the lag in detection would be a deal killer for anything more than toy examples. But that’s fine — the MediaTek processor is not the exciting part of the ReSpeaker package. 

If you simply connect the mic array alone to your computer over USB (the array has its own micro-USB port), you get a audio class compliant microphone input that reports itself as 2 channel, 16KHz 24 bit audio. By default, the mic array is flashed with what appears to be custom built XMOS firmware — no source, binary only — that has it doing beamforming, automatic gain control, de-reverberation and noise reduction using all seven microphones. The output is a single audio stream (it supports stereo but it looks like both channels are always the same) over USB that will work in any normal audio software / toolkit that support 16KHz recording. That input sample rate is odd enough to trip up Adobe Audition, for example (my Mac laptops’ output does not support 16KHz playback so it cannot set up the stream), but Audacity and Portaudio work fine. So without doing anything you can get a great far field USB microphone tuned for voice control across a room, with much better acoustic specifications than the internal microphones on your laptop.

There’s a lot of DSP power on the XMOS chip, and we should be able to tweak parameters re-configure the seven microphones to perform under different circumstances. It turns out the binary firmware installed on the ReSpeaker mic array is set up with a series of HID registers for parameterized control and data access. Out of the box, you can ask the mic array for statistics about the voice input, or change features of the acoustic processing in real time. To demonstrate, I built a simple C or Python (your choice!) script that wraps a USB HID library to get access to the registers of the mic array. For example, if you run the Python example, you can record audio while also seeing the detected angle of the voice (where in space the voice source is) as well as when the device detects speech (“voice activity detection” aka VAD.) Or can you can make all the LEDs glow different colors, change the gain control, bypass the DSP, and a lot more. Check it out! 

The code uses the hidapi library to access USB HID registers on the mic array device. You can write or read to USB HID registers using a pretty straightforward socket-style approach. For example, reading the status of something (say, the automatic gain controls’ current dB, or the state of an onboard LED) involves writing to a request register and then reading it back:

Luckily, the developers behind the ReSpeaker uploaded a Microsoft Excel file to one of their GitHub repositories with all of the existing registers that their firmware supports. It’s a bit hard to read, but here it is in CSV form

You can see you have access to a lot of LED control, and then all sorts of parameters involving beamforming, reverb, echo removal, noise removal, gain control, delay estimation, VAD status, and voice angle. Voice angle and VAD are great demos of a microphone array: one can predict from the arrival of data (aka TDOA) into each microphone where the angle of approach of the sound is. Likely, the XMOS firmware is using a variant of GCC-PHAT. Here’s a run of the Python script where I stood around the microphone at different positions in my office:

 Note that in this Python example I’m using a “auto report” register: this is data being sent by the USB HID (the ReSpeaker mic array) no matter if it is being asked for or not, on register 0xFF. In this case, the mic array is broadcasting the very useful data of VAD status (“is there voice coming in right now”) and voice angle (“where is the voice coming from?”) as soon as the VAD status changes, without the USB host having to ask for it. You can also simply ask for the angle or VAD status by querying the registers whenever you want.

For those that want to dig even deeper, XMOS maintains an Eclipse IDE based tool to build new firmware called xTIMEcomposer. The ReSpeaker team also released a DFU flasher to install new firmware on the array. This could make building new types of microphone processing easy.

Using this mic array alone with a more powerful computer (or even a more powerful embedded Linux board like a Raspberry Pi 3) could get you much closer to a home Echo that doesn’t have to “phone home” (to Seattle or Mountain View.) Or you could transmit the voice audio to your own servers. I look forward to the community’s exploration of solid acoustic hardware applied to homegrown ASR & natural language understanding applications, going beyond what the current voice control devices let us do. 

Leaving Spotify & The Echo Nest

My last day at Spotify was last week. I’m not working on music discovery for the first time in my life since May 2000. I’m going to take some time off to finish some personal projects and start something new in 2017. I love Spotify: the people, the product, the creators, the users, and the mission. Their acquisition of my company The Echo Nest in 2014 was excellent for everyone involved but especially for artists and listeners. We’ve changed the landscape of music. It took me a long time to come to this decision, but it’s now time for me to learn more things and try something new.

My professional & personal life for so long has revolved around scalably helping artists find fans and fans finding new artists. I was a musician and computer science grad student in NYC at the top of the millennium, trying to tie together all the fast changes in distribution, recommendation, machine learning, signal processing and natural language understanding. Over the next five years I became an an academic at MIT working to fully explore the connection between the sound of music, the way people described it, and how it was received. With my labmate Tristan’s focus on musical signal processing we started The Echo Nest in 2005. We quickly built our research into products and grew our team to include our CEO Jim, Paul, offices in Somerville, NYC, SF, London and 70 amazing engineers, scientists and music-crazy people. Over the next 9 years, we powered the world of music for practically every single online service out there with a novel developer platform strategy.

The acquisition by Spotify in March 2014 was simply perfect. Both sides put endless amounts of effort into making it work, and within a few months we had fully integrated teams with a stunning new focus on making the best recommendation and music understanding products. The personalization, retrieval and knowledge graph team is now one of the biggest at Spotify. Almost every single former Echo Nest employee is still there after close to 3 years and loving the opportunity – very rare for technology acquisitions. I moved to our NYC office, still regularly visited “Spotify Boston” and was very lucky to sit next to the combined team as they built out our now-tentpole discovery products: Fresh Finds, Discover Weekly, Release Radar, and Daily Mix. Independent artists write me every day with a beautiful story about their appearance on Fresh Finds or an editorial playlist that then scaled to hundreds of thousands or millions of listeners via Discover Weekly or Daily Mix. We’re scaling with them: we’ve heavily invested in the future of discovery through research, machine learning, curation and data engineering, and there’s so many amazing things yet to come. The fight for care & scale in music discovery is far from over, but I can now step back and let their magic happen. It’s extraordinary to be able to watch an entire field form up from under you and even more amazing to be able to walk away to see where it goes.

I’m taking about three months off to rest, regroup, visit companies and friends and finish up some long-simmering personal projects. After that, the only thing I know right now is that I’ll be ready to do it all over again. Like many of my friends, I’m especially reflective these days on the role of prediction, privacy, information retrieval and machine learning on our culture. Music at its best acts as both a lens towards as well as a projection on the rest of our society. We’ve made great strides increasing the diversity of styles and the musicians themselves that people are listening to through careful editorial & algorithmic approaches. We take bad results very personally. We do everything we can to help surface creativity of all types and scale that beautiful moment when a true message hits its receiver. I need to do more, particularly beyond music.

Please reach out if you’d like to chat. You’re all great.

Brian brian@variogr.am

Greenpoint NYC

Nov 16 2016

Understanding the brand new

Fresh Finds

Today, my favorite personal project at Spotify since the acquisition is getting soft-launched alongside a great long piece in Fast Company about discovery at Spotify: “Fresh Finds,” a weekly updated playlist of music that no one has heard yet but will break out soon. The playlist is powered by the careful and passionate work of a small team at Spotify: Kurt Jacobson, Athena Koumis, Jason Steinbach, Dan Stowell, and myself.

Squad

“Fresh Finds” is made possible by a scalable analysis of the musical activity happening outside Spotify: daily, we automatically find artists people are talking about on music blogs and news sites more often and with more intensity than their playcounts should suggest. These are the artists that find fans through word-of-mouth, shows and the hard work of making unique music that connects to at least one person. We then filter those artists through a real time analysis of Spotify listening behavior and weekly generate a list of brand new songs that we think we will gain in popularity the next week.

Here’s Fresh Finds, updated weekly on Wednesdays:

The listener activity that happens to music deep in this brand-new, unheard part of the spectrum is hard to automatically understand. It comes from nowhere, and people discover it from other people, often outside of our platform. They read music blogs and press, or a friend in the know passes on a link. It’s not based on popularity or audio or likes or clicks. Some of these Fresh Finds had virtually no plays when data indicated we should publish them on the playlist. Watching a brand new artist release a brand new track with no connection or external push and seeing it at first slowly, then rapidly gain in plays on Spotify has been life-affirming.

I see recommendation, filtering, or prediction of this “brand new” as a new artistic frontier in music understanding. Every music data scientist wants to help artists and listeners, but the quantity of known good that a precise recommendation for a well known band earns all but vanishes when stacked up against the connection between a new unknown and her new fan base. I’m ecstatic to help that process even a little bit.

My favorite jams from the past 6 months of Fresh Finds:

Since Fresh Finds started working internally for us early this year, I’ve discovered more music there than from any other technology or web site or service I’ve ever used. It’s been a great pleasure to hear new musicians’ fresh new work every morning. I’ve seen more local shows, I’ve helped new artists get booked, I awkwardly excitedly tweet about artists, and I’ve never felt better about the future of music.

I hope you enjoy Fresh Finds, and I can’t wait for the next thing we can do with its potential.

10 Years

10 years ago today, in 2005, Tristan and I signed the documents to incorporate The Echo Nest Corporation in Delaware, making me a “co-founder” right out of grad school. We were fanatical about music discovery and wanted our unique blend of technology to help listeners and artists. I’ve since worked harder than I ever thought imaginable, oversaw spectacular successes and overcame massive failures, turned equal parts jerk, compassionate, and anxious ball of wire; I questioned my life’s worth untold times, went from comfortable to destitute to overwhelmed; and I met all of my best friends. And we’re still fanatical about music discovery, and our unique blend of technology is helping listeners and artists.

We made a big change last year, at the peak of our powers: we were generating recommendations and playlists over 300 times a second for our customers and had grown to over 60 attractive employees in early 2014. And since then, of course, it’s gotten even better. But I never trust any startup person that says they’ve won. There’s always more to do. Nonetheless, I’m going to take a breath today to remember all the amazing people that got us here.

To the hundreds of you who were ever a part of this: I thank you to pieces. You both made this thing work, and were the thing itself. I’m sorry if I asked you who paid your salary that one time, or called you at midnight because a service went down, or reminded you how to spell our name, or got too emotional during an all-hands speech, or rewrote one of your lines of code. You have to understand: my dominant feeling throughout the course of The Echo Nest’s life was surprise. I was surprised that we could start a company from our dissertations. I was surprised we could hire people. I was surprised Jim wanted to join us as CEO. I was surprised we could raise money. I was surprised we got our first customers. I was surprised the people we hired cared so much. I was surprised everyone was working so hard, and that the company was becoming so successful. I was surprised Spotify was interested and that it’s worked so well. I remain surprised that everyone’s still with us, happier than ever in our much bigger new family, working even harder on the next big thing.

I was always standing as far ahead of the boat I could, eyes wide in awe that we somehow hadn’t run aground, but barking behind me to try to ensure we wouldn’t. Maybe you all should have tied me to the mast instead; we did great.

The empty office

The empty office, Davis Square, August 2005

The early board

The first board of directors, including Barry, Don Rose & McLagan, Bethe, Andre & Dorsey, and Elliot, 2008

Jim and T and Tim

Jim and Tristan and me and Tim, 2008

Introduction in Amsterdam

My hosts’ introduction of me before a talk in Amsterdam, 2009

Boston Phoenix

Early article in the Boston Phoenix

21 days until Echoprint release

Usual Brian management style, here of poor Alastair pre-Echoprint release, 2010

Ghost tracks

Team Ghost tracks

London team settles in in Somerville

Our London team settles in in Somerville

Elissa and Amanda

Elissa and Amanda

Early 2013 photo

Early 2013 group photo

Telling the office

Telling the office what had just happened (including 6am SF office), March 6 2014

I want to know the size and temperature of bread while it rises

Bread

We built a device to measure bread as it rises and in the process I gave a lot of thought to tools and inventors.

If you create meticulous tools and platforms, only a tiny fraction of the world is going to have the desire and knowledge to latch on. Your best hope is to be bright enough for other tool-builders to swarm to, and then hope the weight of the pyramid stacked on top of you won’t kill you. It might be that the startup fan fiction of “prosumer builders” simply doesn’t have a quorum strong enough to fund an industry – maybe their one-off projects remain, just that, jumper wires in a shoe box. The technologists and inventors I know tend to avoid reliance on existing platforms; we desire total control, we want to burrow down from the physical interface (pins, interfaces, connectivity) to the electrons swirling around the transistors until we’re comfortable that what we make is suitably ours.

Early sketch

I thought a lot about this natural law when my friend asked me if a device she had envisioned was possible. She bakes a lot of bread, and wanted a way to visualize the rise: both the size of the growing dough and its internal temperature. She wanted it to show on her phone so she could be somewhere else during a long rise and check in on it once a while. So figure the continuum: we could have bought some “internet of things” kit from Best Buy in a white plastic case that graphed temperature on our phones. And maybe a camera that uploaded live video to Google to check the size. That’s expensive, overwhelming, and although it would work, it’s doesn’t fit what we wanted. It would feel wasteful. A technologist sees that as potential: it could be smaller, it could cost a lot less, and tries to fill it: I could source some microcontrollers, design a PCB, 3D print a case, design a reliable web service, experiment with range and temperature sensors and instead make a bona-fide product where only one would ever be made.

I instead ended up with something new, in the middle, that surprised me: this piece of hardware, the Sparkfun Thing, can ship to you overnight, and costs $16. It can talk over WiFi. It’s got GPIO, ADC, I2C, and a li-poly charger circuit. It can be programmed over USB using the Arduino software. We built the Bread Detector using The Thing, and was simply impressed with the balance of control and “batteries-included” (literally.)

The difference wasn’t the onboard specs or even cost, but the service stack: The Thing comes pre-packaged with instructions and examples for posting the output of sensor data to the Phant data logging service, which is free with limitations – you can send data roughly every 10 seconds, and are limited to 50 MB before it rolls over. Phant was that missing layer, with one line of code I can push bits to a reliable service with a simple API, ready for analysis, graphing, alerts.

The simplicity of the platform also means something like The Bread Detector can be built by anyone, with simple wiring and low-cost sensors. I wonder if the software layer was our problem all along: I can teach a less-inclined person to solder and plug in wires, but would have a lot harder time getting them to set up a web service to store data and respond to queries.

When I mentioned the Phant service and the Thing on Twitter, Antonio, someone I don’t normally bet against, said he thinks it will be free one day. Put on your capitalist hat for a second. How will that work? The Amazon Dash buttons have a clear proposition: you’ll buy more stuff, so give them away. Will the data sitting on Phant soon earn Sparkfun greater revenue than the parts? Will every 5th bread rise give me an ad for King Biscuit?

Perhaps in the near future, the prosumer builders’ products themselves become a very different kind of product.

Detecting some bread

Walk to work

I recorded my walk to work on binaural microphones. You should hear it on headphones sometime, it’s strangely soothing.

I made this all with a (relatively) cheap kit: the Tascam DR-05 recorder and a pair of Soundman OKM II Studio Binaural Microphones. Soundman only appears to sell their microphones, new, on eBay, these days.

When you walk with these on you feel like a performance. You listen just as carefully as the microphones are. You move fluidly and try not to rustle, cough, or act aggressively. I walk this hour (including the 10 minute ferry ride) twice a day, whenever I’m in NYC during the week. Even if I never shared it, or deleted the WAV file immediately, I’d want to keep doing it this way.