My name is Brian Whitman. I am a lapsed scientist and sound artist currently co-founder/CTO at The Echo Nest, a music intelligence company in Somerville, MA. As I work on various scaling and media search problems with detours into art projects I'll be posting details here in the hopes that I can learn from others. I'd always like to hear from you if you are working on similar things.
Sometime in early 08 it became clear to LEXATEXY that our biggest client’s solr servers had too many documents on them. [Rule of thumb for “how many documents is too many?”: 15 million. I can upgrade, you bargain. What if I have a super-machine bought with my recently closed VC round?

15 million.]
You’ve already got a pretty reliable thing handling 15 million documents. It can sort, facet, do autocomplete, metaphone spelling search, it uses an ID scheme you thought was clever, it doesn’t restart itself manually with OOM KILLER messages. It has a URL you thoughtfully CNAMEd to solr.mycompanyname.com. So now you’re telling me it’s time for another server? Thanks a lot, eschatology.
SOLR-303 has been around since mid-07. It wasn’t really ready at all for anyone to use until mid-08, and then, of course, LEXATEXY snapped up on it like a zamboni running on fumes. We’ve used it to successfully go from 1 million to 20 million to then space for 100 million (but straining hard.) Like the rest of Solr, it disarmingly just works. You can set your shards in your solrconfig or at query time (we just wrote it into our python client.) Queries can hit any random shard (even one that is not in the shards list!) and the shard will then ask each server in the shards param for the answer. The host you asked initially will do the query blending and return your result (hopefully with the shard the doc came from in the result). Sort and range queries work OOTB. Big queries with start and rows set are doomed as to ensure you get all results back it sets start to 0 across all shards and rows to start. (You can hack around this.)
If it’s so awesome, why doesn’t LEXATEXY marry it?
I have received some good advice about how to fix some of these problems. They are still in thought phase, but it comes down to a few little things.
I hope people that deal with large Solr and Lucene indexes somehow get pointed here so they can laugh and swap stories. If anyone wants to chime in, I’ll give you some column inches to school us all. But failing that you’re all about to see a few guys and their paid handholders (METALEXATEXY) flail around trying to figure this out for real over the next few months.