Wednesday 3 August 2011

Is Stonebraker right? Why SQL isn’t the choice du jour for many apps

Is Stonebraker right? Why SQL isn’t the choice du jour for many apps: "
Two weeks ago, I wrote a post that sparked a pretty overwhelming response. The gist of the post, derived from an interview with database pioneer Michael Stonebraker, was that legacy SQL databases, including MySQL, are relics and no longer relevant with regard to today’s web applications. Stonebraker cited Facebook’s renowned MySQL-plus-memcached architecture as an example of how much effort it takes to make such databases keep up with applications that store lots of data and serve high rates of transactions.


Michael Stonebraker

By and large, the responses weren’t positive. Some singled out Stonebraker as out of touch or as just trying to sell a product. Some pointed to the popularity of MySQL as evidence of its continued relevance. Many challenged how Stonebraker dare question the wisdom of Facebook’s top-of-the-line database engineers.

They’re all fair-enough statements, but they also somewhat missed the point. Stonebraker wasn’t calling out Facebook, nor was he suggesting (as far as I can tell) that it abandon MySQL tomorrow. Yes, he has a product, VoltDB, to sell, but that shouldn’t blur the overall message: Whatever database technology someone might choose to use for a new web application, anyone who hopes to achieve even a fraction of Facebook’s traffic should not go down the same path as Facebook did.

Facebook’s implementation is a sign of the times in which it was built, but the evidence suggests that if Facebook could do it over again with today’s database options, it wouldn’t go down the same path. Sharding MySQL thousands of times, operating thousands of memcached servers and paying a team of crack engineers to keep it scaling is nobody’s idea of fun.

First, Facebook


Nobody denies that Facebook’s MySQL team is supremely smart or that it does a great job innovating to ensure that the database is able to keep up with the site’s transactions.


Jim Starkey

Jim Starkey, the founder and CTO of NimbusDB — and a man with some serious relational database and MySQL credentials – puts it well. “You either scale to where your customer base takes you or you die,” he said, and Facebook has been able to do with MySQL what would others would not have been able to do. It has “absolutely skilled” engineers, he added, but they don’t exist everywhere, and Facebook has the added benefit of being able to pay them.

Paul Mikesell, the founder and CEO of Clustrix, echoed that sentiment, telling me that Facebook has done great work to make its site scalable. Clustrix sells a “NewSQL” database that is compatible with MySQL. Interestingly, Jonathan Heiliger, the soon to be former VP of technical operations at Facebook, sits on Clustrix’s advisory board.

No, it’s not so much Facebook’s MySQL implementation that’s the problem. By and large, it does what it’s designed to do, which is to keep up with the myriad status updates and other data that populate users’ profiles. Rather, it’s that Facebook had to expend so much money and so many man-hours to get there.

Facebook has declined numerous requests for comments, save for this snippet from a spokesperson: “[Our] philosophy is to build infrastructure using the best tools available for the job and [we] are constantly evaluating better ways to do things when and where it matters.”

Indeed it is. As I noted in the original post, as Facebook has rolled out new applications, it has increasingly utilized newer database technologies better suited for those tasks. Inbox search within Facebook is powered by the Cassandra NoSQL database that it created, while Facebook Messages and some other new applications use HBase. It looks like Facebook is onto something.

Actually, MySQL isn’t the problem . . .



Curt Monash

According to database industry analyst Curt Monash, Stonebraker makes a valid point in citing Facebook’s complex MySQL situation, because Facebook isn’t using MySQL for its relational capabilities. MySQL might be a fine database choice for a low-end application that requires full relational capabilities, but sharded MySQL plus memcached is not. You lose a lot of those as soon as you begin sharding, he explained, and the application actually communicates directly with memcached for data that resides in that layer. It’s that architecture that’s the problem.

Monash believes there are two timelines for when a technology runs its course, depending on the situation: when you shouldn’t use it to start a new project, and when you should upgrade. For new projects that might have to scale massively, he said, you wouldn’t choose MySQL plus memcached.

As for the sharding, Starkey said, “The only thing sharding has going for it is the absence of alternatives.” He noted that although it’s difficult to find anything he and Stonebraker agree on, they do both agree that traditional SQL databases aren’t easy to scale. Because scaling them is so complex, Starkey — who, like Stonebraker, has a horse in the NewSQL race with NimbusDB — thinks all legacy databases will be irrelevant in a few years. All except low-end MySQL, that is.

Monash said there are several possible options for companies that want to retain MySQL features while still being able to scale, including Clustrix, TokuDB, ScaleDB and Schooner MySQL with Active Cluster. Clustrix’s Mikesell noted that several of its customers were very happy to be done sharding after they made the switch, while others saved lots of human and capital resources by never having to shard in the first place.

There also are startups, such as dbshards and ScaleBase, that make sharding transparent to applications, saving developers from having to write applications that can handle a sharded database.

… always


However, if you don’t need relational features and/or ACID compliance, Monash says there are many possibilities, of which VoltDB, NimbusDB and the other NewSQL databases might not even be the best options. Monash actually takes a pretty harsh stance when it comes to VoltDB.

Even Starkey acknowledges this, explaining that you only really need ACID if you have valuable data. Google has a relational database for its revenue-related information, he said, but uses NoSQL tools like BigTable elsewhere. If a company has plans for its web application to scale and start driving a lot of traffic, Starkey said, he can’t imagine why it would build that new application using MySQL.

But Facebook isn’t a greenfield environment, which makes matters more complicated. Given Facebook’s reliance on memcached and use of it as a key-value store, though, Monash said a Membase Server, a NoSQL database, might actually be a good replacement if Facebook were to transition from MySQL. That’s because Membase has memcached built in and is designed to mimic it in many ways, only in a single tier.

James Phillips, the co-founder and senior VP of products at Couchbase (the new corporate home for Membase Server), said the vast majority of Membase deployments are for new applications, but large sites switching to it from a MySQL-plus-memcached environment isn’t unheard of. In fact, Zynga recently made the switch.

Also, Netflix recently transitioned from an Oracle database to SimpleDB on Amazon Web Services and Cassandra. For a detailed explanation of how and why, check out this presentation by Sid Anand, its cloud data architect.

Based on what he knows of Facebook’s architecture, some of which likely was gleaned from Facebook Director of Engineering Robert Johnson, who sits on Couchbase’s advisory board, Phillips thinks it would be possible, although not necessarily easy, for Facebook to make a switch.

Furthermore, most NoSQL databases and a number of NewSQL databases have open-source and/or free versions, so developers concerned with cost or flexibility aren’t without options.

In closing


Monash sums it up nicely: “Are there undesirable aspects to the Facebook architecture? Absolutely. Are they as serious as [Stonebraker] makes them out to be? Absolutely not.”

That’s because it has the engineering talent to do what it pleases, whether that’s sticking with MySQL or eventually transitioning to something else. But not everyone has that luxury, and if they don’t really need a relational database, or really need a relational database that can scale, there’s a strong case to be made that MySQL is no longer the most desirable option.

Image courtesy of Flickr user mandiberg

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.







"

No comments:

Post a Comment