Unlimited-Data. moved to lab.itbee.vn : Software architecture

Showing posts with label Software architecture. Show all posts

Sunday, 8 January 2012

How Facebook built its Timeline feature

How Facebook built its Timeline feature:

Facebook’s Timeline feature is beautiful, although some revile it. But love it or hate it, Timeline is the engineering equivalent of building a racing bike customized for a specific track, only without testing either until race day. At least that’s how Ryan Mack, an infrastructure engineer with Facebook, makes it seem on a blog posted Thursday detailing how Facebook engineered its Timeline feature.

Making a legacy MySQL database faster.

The blog post also offers a ton of data on how Facebook dealt with petabytes of user data stored in legacy MySQL systems on slow disks that could make Timeline less responsive. The company implemented a new architecture that separates out older data to slower disk storage and keeps more recent and more accessed data stored in flash drives and cached using memcached. From the blog:

Before we began Timeline, our existing data was highly normalized, which required many round trips to the databases. Because of this, we relied on caching to keep everything fast. When data wasn’t found in cache, it was unlikely to be clustered together on disk, which led to lots of potentially slow, random disk IO. …A massive denormalization process was required to ensure all the data necessary for ranking was available in a small number of IO-efficient database requests.

Here’s a visual of the system:

Mack spent a lot of time detailing the challenges of denormalizing the data, which entailed getting an intern to define a custom language that would tell a compiler how to convert old data into the new format and the use of three “data archeologists” to write the conversion rules. It also required Facebook to move older activity data to slower network storage while maintaining acceptable performance. To do this Facebook, “hacked a read-only build of MySQL and deployed hundreds of servers to exert maximum IO pressure and copy this data out in weeks instead of months.”

To speed IO even further, engineers consolidated join tables into a tier of flash-only databases. Since PHP can perform database queries on only one server at a time, Mack said Facebook wrote a parallelizing query proxy that allowed it to query the entire join tier in parallel. Finally Facebook attempted to future-proof its data model, but we’ll see how far that takes it.

In response to a question I sent Mack, said through a spokesman, that it took Facebook two weeks to dump everyone’s old profile data from the existing system into an archive using “some off-the-shelf network attached storage.” That data was then stored on “the new servers that now power Timeline, which, at the time, were fresh and didn’t have anything hitting them from production. We did incremental updates weekly until later, when we had writes going to both systems in real time.” That’s a lot of fresh servers, although Facebook remained mum about the exact data transferred.

Facebook also built the Timeline aggregator to run locally on each database to avoid information traveling over the network unless that information will be displayed on the page. That’s another timesaver. It also used it’s Hip Hop code for speeding up PHP in that aggregator.

Building in parallel.

The other element of the blog that I found amazing was how Facebook apparently denormalized its data, had a product team visualizing the Timeline UI, and built a scalable back end system for everything to run on all at the same time. I’m going to call that crazy, but Mack notes that’s what kept the development time frame to a mere six months.

I asked Facebook about what it learned during that process, because I imagine a lot of other sites would love to be able to develop their front and back end systems in parallel and save the cost of replicating their entire production environment’s data to test it. The response a spokesman emailed me from Mack was as follows:

Layering is a good way of describing the process of building Timeline. To make sure we could work in parallel, we always made sure that there was at least a temporary technology that each part of the team could build on top of. Some of these were pre-existing, but many of them were quick prototypes built in a couple of days as needed. We also had frequent scrums to identify places where people were blocked on each other, and everyone on the team did an excellent job of knowing what others needed or would soon need.

When the production solutions were ready, there was always some friction in migrating from the temporary solution, but usually it was just a few days of work. It was a definite win compared with the weeks or months of delay we’d have faced if we had waited for the production solution up front. Given that we wanted to build the entire system in six months, it was critical that we avoided wasting any time.

From nothing to Timeline in six months is pretty impressive, considering we’re talking about creating something to support more than 800 million users. Now you know a bit more how this Timeline sausage was made.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

Sunday, 26 June 2011

Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB

Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB: "Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB:
Dhanji R. Prasanna leaving Google:

Here is something you’ve may have heard but never quite believed before: Google’s vaunted scalable software infrastructure is obsolete. Don’t get me wrong, their hardware and datacenters are the best in the world, and as far as I know, nobody is close to matching it. But the software stack on top of it is 10 years old, aging and designed for building search engines and crawlers. And it is well and truly obsolete.

Protocol Buffers, BigTable and MapReduce are ancient, creaking dinosaurs compared to MessagePack, JSON, and Hadoop. And new projects like GWT, Closure and MegaStore are sluggish, overengineered Leviathans compared to fast, elegant tools like jQuery and mongoDB. Designed by engineers in a vacuum, rather than by developers who have need of tools.

Maybe it is just the disappointment of someone whose main project was killed

. Or maybe it is true. Or maybe it is just another magic triangle:

Agility Scalability Coolness factor Triangle

Agility Scalability Coolness factor Triangle

Edward Ribeiro mentioned a post from another ex-Googler which points out similar issues with Google’s philosophy.

Original title and link: Google BigTable, MapReduce, MegaStore vs. Hadoop, MongoDB (NoSQL databases © myNoSQL)

Thursday, 23 June 2011

High Scalability - High Scalability - 35+ Use Cases for Choosing Your Next NoSQL Database

We've asked What The Heck Are You Actually Using NoSQL For?. We've asked 101 Questions To Ask When Considering A NoSQL Database. We've even had a webinar What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.

Now we get to the point of considering use cases and which systems might be appropriate for those use cases.

What Are Your Options?

First, let's cover what are the various data models. These have been adapted from Emil Eifrem andNoSQL databases.

Document Databases

Lineage: Inspired by Lotus Notes.

Data model: Collections of documents, which contain key-value collections.

Example: CouchDB, MongoDB

Good at: Natural data modeling. Programmer friendly. Rapid development. Web friendly, CRUD.

Graph Databases

Lineage: Euler and graph theory.

Data model: Nodes & relationships, both which can hold key-value pairs

Example: AllegroGraph, InfoGrid, Neo4j

Good at: Rock complicated graph problems. Fast.

Relational Databases

Lineage: E. F. Codd in A Relational Model of Data for Large Shared Data Banks

Data Model: a set of relations

Example: VoltDB, Clustrix, MySQL

Good at: High performing, scalable OLTP. SQL access. Materialized views. Transactions matter. Programmer friendly transactions.

Object Oriented Databases

Lineage: Graph Database Research

Data Model: Objects

Example: Objectivity, Gemstone

Key-Value Stores

Lineage: Amazon's Dynamo paper and Distributed HashTables.

Data model: A global collection of KV pairs.

Example: Membase, Riak

Good at: Handles size well. Processing a constant stream of small reads and writes. Fast. Programmer friendly.

BigTable Clones

Lineage: Google's BigTable paper.

Data model: Column family, i.e. a tabular model where each row at least in theory can have an individual configuration of columns.

Example: HBase, Hypertable, Cassandra

Goog at: Handles size well. Stream massive write loads. High availability. Multiple-data centers. MapReduce.

Data Structure Servers

Lineage: ?

Example: Redis

Data model: Operations over dictionaries, lists, sets and string values.

Good at: Quirky stuff you never thought of using a database for before.

Grid Databases

Lineage: Data Grid and Tuple Space research.

Data Model: Space Based Architecture

Example: GigaSpaces, Coherence

Good at: High performance and scalable transaction processing.

What Should Your Application Use?

Key point is to rethink how your application could work differently in terms of the different data models and the different products. Right data model for the right problem. Right product for the right problem.

To see what models might help your application take a look at What The Heck Are You Actually Using NoSQL For? In this article I tried to pull together a lot of unconventional use cases of the different qualities and features developers have used in building systems.

Match what you need to do with these use cases. From there you can backtrack to the products you may want to include in your architecture. NoSQL, SQL, it doesn't matter.

Look at Data Model + Product Features + Your Situation. Products have such different feature sets it's almost impossible to recommend by pure data model alone.

Which option is best is determined by your priorities.

If Your Application Needs...

complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.
- Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!

to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.

to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.

to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also consider SSD.

to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in- memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.

If Your Application Needs...

to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.

powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.

to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.

to build CRUD apps then look at a Document database, they make it easy to access complex data without joins.

built-in search then look at Riak.

to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.

programmer friendliness in the form of programmer friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.

If Your Application Needs...

transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.

enterprise level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.

to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.

to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.

to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.

to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.

to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.

If Your Application Needs...

to bulk upload lots of data quickly and efficiently then look for a product supports that scenario. Most will not because they don't support bulk operations.

an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.

to implement integrity constraints then pick a database that support SQL DDL, implement them in stored procedures, or implement them in application code.

a very deep join depth the use a Graph Database because they support blisteringly fast navigation between entities.

to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

If Your Application Needs...

to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.

a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use on of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).

fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.

other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.

to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.

If Your Application Needs...

support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.

creates an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.

to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.

fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.

to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.

to work on a mobile platform then look at CouchDB/Mobile couchbase.

Which Is Better?

Moving for a 25% improvement is probably not a reason to go NoSQL.

Benchmark relevancy depends on the use case. Does it match your situation(s)?

Are you a startup that needs to release a product as soon as possible and you are playing around with ideas? Both SQL and NoSQL can make an argument.

Performance may be equal on one box, but what happens when you need N?

Everything has problems, if you look at Amazon forums it's EBS is slow, or my instances won't reply, etc. For GAE it's the datastore is slow or X. Every product which people are using will have problems. Are you OK with the problems of the system you've selected?

Saturday, 18 June 2011

LexisNexis open-sources its Hadoop killer

LexisNexis open-sources its Hadoop killer: "

LexisNexis is releasing a set of open-source, data-processing tools that it says outperforms Hadoop and even handles workloads Hadoop presently can’t. The technology (and new business line) is called HPCC Systems, and was created 10 years ago within the LexisNexis Risk Solutions division that analyzes huge amounts of data for its customers in intelligence, financial services and other high-profile industries. There have been calls for a legitimate alternative to Hadoop, and this certainly looks like one.

According to Armando Escalante, CTO of LexisNexis Risk Solutions, the company decided to release HPCC now because it wanted to get the technology into the community before Hadoop became the de facto option for big data processing. Escalante told me during a phone call that he thinks of Hadoop as “a guy with a machete in front of a jungle — they made a trail,” but that he thinks HPCC is superior.

But in order to compete for mindshare and developers, he said, the company felt it had to open-source the technology. One big thing Hadoop has going for it is its open-source model, Escalante explained, which attracts a lot of developers and a lot of innovation. If his company wanted HPCC to “remain relevant” and keep improving through new use cases and ideas from a new community, the time for release was now and open source had to be the model.

Hadoop, of course, is the Apache Software Foundation project created several years ago by then-Yahoo employee Doug Cutting. It has become a critical tool for web companies — including Yahoo and Facebook — to process their ever-growing volumes of unstructured data, and is fast making its way into organizations of all types and sizes. Hadoop has spawned a number of commercial distributions and products, too, including from Cloudera, EMC and IBM.

How HPCC works

Hadoop relies on two core components to store and process huge amounts of data: the Hadoop Distributed File System and Hadoop MapReduce. However, as Cloudant CEO Mike Miller explained in a post over the weekend, MapReduce is still a relatively complex language for writing parallel-processing workflows. HPCC seeks to remedy this with its Enterprise Control Language.

Escalante says ECL is a declarative, data-centric language that abstracts a lot of the work necessary within MapReduce. For certain tasks that take a thousand lines of code in MapReduce, he said, ECL only requires 99 lines. Furthermore, he explained, ECL doesn’t care how many nodes are in the cluster because the system automatically distributes data across however many nodes are present. Technically, though, HPCC could run on just a single virtual machine. And, says Escalante, HPCC is written in C++ — like the original Google MapReduce on which Hadoop MapReduce is based — which he says makes it inherently faster than the Java-based Hadoop version.

HPCC offers two options for processing and serving data: the Thor Data Refinery Cluster and the Roxy Rapid Data Delivery Cluster. Escalante said Thor — so named for its hammer-like approach to solving the problem — crunches, analyzes and indexes huge amounts of data a la Hadoop. Roxie, on the other hand, is more like a traditional relational database or database warehouse that even can serve transactions to a web front end.

We didn’t go into detail on HPCC’s storage component, but Escalante noted that it does utilize a distributed file system, although it can support a variety of off-node storage architectures and/or local solid-state drives.

He added that in order to ensure LexisNexis wasn’t blinded by “eating its own dogfood,” his team hired a Hadoop expert to kick the tires on HPCC. The consultant was impressed, Escalante said, but did note some shortcomings that the team addressed as it readied the technology for release. It also built a converter for migrating Hadoop applications written in the Pig language to ECL.

Can HPCC Systems actually compete?

The million-dollar question is whether HPCC Systems can actually attract an ecosystem of contributors and users that will help it rise above the status of big data also-ran. Escalante thinks it can, in large part because HPCC already has been proven in production dealing with LexisNexis Risk Solutions’ 35,000 data sources, 5,000 transactions per second and large, paying customers. He added that the company also will provide enterprise licenses and proprietary applications in addition to the open-source code. Plus, it already has potential customers lined up.

It’s often said that competition means validation. Hadoop has moved from a niche set of tools to the core of a potentially huge business that’s growing every day, and even Microsoft has a horse in this race with its Dryad set of big data tools. Hadoop has already proven itself, but the companies and organizations relying on it for the their big data strategies can’t rest on their laurels.

Image courtesy of Flickr user NileGuide.com.

Related content from GigaOM Pro (subscription req’d):

Wednesday, 26 January 2011

InfoQ: Asynchronous, Event-Driven Web Servers for the JVM: Deft and Loft

Asynchronous, event-driven architectures have been gaining a lot of attention lately, mostly with respect toJavaScript and Node.js. Deft and Loft are two solutions that bring "asynchronous purity" to the JVM.

RelatedVendorContent

Domain-Specific Languages for Functional Testing

Future of Java Track - EE, Spring, JVMs+ @QConLondon

Agile Maturity Model Applied to Building and Releasing Software

WebSphere Application Server V7.0 Security Redbook

JBoss versus IBM WebSphere: Cost, Performance, Efficiency, Innovation (IBM wins)

Related Sponsor

Dynamic Application Infrastructure delivers the innovation, performance and scalability to build, deploy and manage all types of highly robust applications.

InfoQ had an interview with Roger Schildmeijer, one of the two creators, about these two non blocking web server frameworks:

InfoQ: What is Deft and Loft?

Roger: So before I start to describe what Deft and Loft is I would like to start from the beginning. September 2009 Facebook open sourced a piece of software called Tornado which was a relatively simple, non-blocking Web server framework written in Python, designed to handle thousands of simultaneous connection. Tornado gained a lot of attraction pretty quickly and became quite popular because of its strength and simplistic design. At this time a lot of developers out there became aware of the "c10k problem" (from Wikipedia: The C10k problem is the numeronym given to a limitation that most web servers currently have which limits the web server's capabilities to only handle about ten thousand simultaneous connections.)
In the late summer of 2010 Jim Petersson and I started started to discuss and design an asynchronous non-blocking web server/framework running on the JVM using pure Java NIO. I would say that the main reason for this initiative was our curiosity about the potential speed improvements that could be achieved with a system similar to Tornado but written in Java. We knew that we could never create an API as clean and simple as Tornado's.
(Clean APIs have never been the advantage of Java, if you ask me)
We got something up and running within the first 48h and saw some extraordinary (very non-scientific) benchmark results. This was the feedback we aimed for and Deft was born.
Just to clarify, I would be the last person to suggest that someone should throw away their current system using Tornado (or some other asynchronous non-blocking web server) and replace it with Deft. Deft is still pretty young and has a lot to learn (there are a lot of issues to be implemented). By the time this interview is published I hope that the next stable Deft release, 0.2.0, will be ready and released.
After a couple of weeks of Deft hacking we started to discuss how Deft would look like if we had used another language, like Scala. It would be very gratifying to create something with nice performance but also a system that had a clean and elegant syntax. This was the seed for another project, and Loft was born. To make another important clarification: Loft is still very much in its infancy and there is yet no stable release available.
The main features that set Loft apart from Deft are:
it's written in Scala

it uses a feature in Scala called continuations. The reason for this is to make asynchronous programming easier. (Will explain that in detail below)

the initial version will be a pure Tornado clone

InfoQ: Would you like to explain to us the motivation behind those architectural decisions?

Roger: We wanted to create something similar to Tornado that runs on the Java Virtual Machine. There are a lot of good (multi-threaded) open source web servers and web frameworks (apache tomcat and jetty are two popular examples) out there already, and we didn't want to compete with those beasts. Instead we wanted to build something that was good at handling thousands of simultaneous connections. The single threaded approach was already tested (and proved successful) by frameworks like Tornado.

InfoQ: What is a typical development workflow in Deft - from coding and debugging, up to deploying and monitoring?

Roger: This is a very good question and something that is very important to address. However, because Deft is so new and its user base is currently small, I'm afraid I can't give an example of a typical development workflow. That is something we hope a growing community will help flesh out..
The biggest difference from the coding that most Java developers do is that everything executes inside a single threaded environment. The benefits of coding that way are that you don't have to use explicit locks and you don't have to think about deadlocks because of inadequate synchronized code. The downside is that you are not allowed to do blocking operations. A blocking operation will stall the entire server and make it unresponsive
Deft 0.2.0 will contain an experimental JMX API used for monitoring. Some examples of things that could be monitored are the number of pending timeouts/keep-alive timeouts, number of registered IO handlers in the event loop and the select timeout of the selector.

InfoQ: Are there any benchmarks that evaluate Deft's performance?

Roger: We have made some (non-scientific) benchmarks against a simple web server using a single request handler that responds with a 200 OK HTTP status code and writes "hello world" to the client (The code for the benchmarks is also available on github.com/rschildmeijer). Against a simple "hello-world server" we have seen speed improvements by a factor of 8-10x (compared to Tornado). The entire results from the benchmarks are available on http://deftserver.org/.

InfoQ: How does Deft compare with other solutions like Tornado, Node.js or Netty?

Roger: Tornado and Node.js are the two other asynchronous web servers that we used in the benchmark. We didn't include Netty because it felt a little bit like comparing apples with oranges. But I wouldn't doubt if Netty showed numbers equal to (or greater?) the results we have seen for Deft. Netty, the successor to apache mina, is a really cool socket framework written by a really smart guy (Trustin Lee).

InfoQ: What was the motivation for Loft and how does it use continuations?

Roger: So finally time to show some code! (I would like to (once again) remind you that Loft is pretty much in the starting blocks and the code snippets for Loft are the proposed design)
A simple request handler for Loft could look something like this:
@Asynchronous   def get() {     val http = AsyncHTTPClient()     reset {       val id = database get("roger_schildmeijer");   //async call       val result = http fetch("http://127.0.0.1:8080/" + id); //async call       write(result)       finish      }   }    val application = Application(Map("/".r -> this))    def main(args: Array[String]) {     val httpServer = HTTPServer(application)     httpServer listen(8888)     IOLoop start   } } 
The main method contains the canonical way to start a Loft instance. The ExampleHandler.get is where things start to get interesting. Inside the method two asynchronous calls are made. Asynchronous programming is often conducted by supplying a callback as an additional parameter, and that callback will be called when the result is ready. And if you have two or more consecutive asynchronous calls, you (usually) will have to chain these calls together.
E.g.:
database get("roger_schildmeijer", (id) => http fetch("http://127.0.0.1:8080/" + id, (result) => write(result))); 
So what is actually going in the ExampleHandler.get method above?
You might have noticed the "reset" word in method, this indicates that some Scala continuations are about to happen. Continuations, as a concept, are hard to grasp. But when the magic disappears, continuations are just functions representing another point in the program (It contains information such as the process's current stack). If you call the continuation, it will cause execution to automatically switch to the point that function represents. (Actually you use restricted versions of them every time you do exception handling).
Just so I don't confuse anyone. The two asynchronous methods "get" and "fetch" must be implemented in a certain way in order for this example to work.

InfoQ: The asynchronous, event-driven paradigm has gained lots of attention lately. Why do you think is that and how do you see this trend evolving in the future?

One reason that systems like Tornado, node.js and Netty have received a lot of attention in recent years is because of big social networks that need a huge number of idle connections.
As long as you and I use services like Facebook and twitter I think the need for systems like Deft, Loft and Tornado will exist.
As a final note I would like to add that we are looking for contributors that are interested in supporting Deft (current status: two committers, two contributors) and/or Loft (two committers).

Friday, 1 October 2010

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems: "

Facebook has been so reliable that when a site outage does occur it's a definite learning opportunity. Fortunately for us we can learn something because in More Details on Today's Outage, Facebook's Robert Johnson gave a pretty candid explanation of what caused a rare 2.5 hour period of down time for Facebook. It wasn't a simple problem. The root causes were feedback loops and transient spikes caused ultimately by the complexity of weakly interacting layers in modern systems. You know, the kind everyone is building these days. Problems like this are notoriously hard to fix and finding a real solution may send Facebook back to the whiteboard. There's a technical debt that must be paid.

The outline and my interpretation (reading between the lines) of what happened is:

Thursday, 23 September 2010

High Scalability - High Scalability - Applying Scalability Patterns to Infrastructure�Architecture

ABSTRACT And APPLY

So the aforementioned post is just a summary of a longer and more detailed post, but for purposes of this post I think the summary will do with the caveat that the original, “Scalability patterns and an interesting story...” by Jesper Söderlund is a great read that should definitely be on your “to read” list in the very near future.

For now, let’s briefly touch on the scalability patterns and sub-patterns Jesper described with some commentary on how they fit into scalability from a network and application delivery network perspective. The original text from the High Scalability blog are in red(dish) text.

Load distribution - Spread the system load across multiple processing units

This is a horizontal scaling strategy that is well-understood. It may take the form of “clustering” or “load balancing” but in both cases it is essentially an aggregation coupled with a distributed processing model. The secret sauce is almost always in the way in which the aggregation point (strategic point of control) determines how best to distribute the load across the “multiple processing units.”

load balancing / load sharing - Spreading the load across many components with equal properties for handling the request
This is what most people think of when they hear “load balancing”, it’s just that at the application delivery layer we think in terms of directing application requests (usually HTTP but can just about any application protocol) to equal “servers” (physical or virtual) that handle the request. This is a “scaling out” approach that is most typically associated today with cloud computing and auto-scaling: launch additional clones of applications as virtual instances in order to increase the total capacity of an application. The load balancing distributes requests across all instances based on the configured load balancing algorithm.

Partitioning - Spreading the load across many components by routing an individual request to a component that owns that data specific
This is really where the architecture comes in and where efficiency and performance can be dramatically increased in an application delivery architecture. Rather than each instance of an application being identical to every other one, each instance (or pool of instances) is designated as the “owner”. This allows for devops to tweak configurations of the underlying operating system, web and application server software for the specific type of request being handled. This is, also, where thedifference between “application switching” and “load balancing” becomes abundantly clear as “application switching” is used as a means to determine where to route a particular request which is/can be then load balanced across a pool of resources. It’s a subtle distinction but an important one when architecting not only efficient and fast but resilient and reliable delivery networks.

- - - Vertical partitioning - Spreading the load across the functional boundaries of a problem space, separate functions being handled by different processing units
      When it comes to routing application requests we really don’t separate by function unless that function is easily associated with a URI. The most common implementation of vertical partitioning at the application switching layer will be by content. Example: creating resource pools based on the Content-Type HTTP header: images in pool “image servers” and content in pool “content servers”. This allows for greater optimization of the web/application server based on the usage pattern and the content type, which can often also be related to a range of sizes. This also, in a distributed environment, allows architects to leverage say cloud-based storage for static content while maintaining dynamic content (and its associated data stores) on-premise. This kind of hybrid cloud strategy has been postulated as one of the most common use cases since the first wispy edges of cloud were seen on the horizon.
    - Horizontal partitioning - Spreading a single type of data element across many instances, according to some partitioning key, e.g. hashing the player id and doing a modulus operation, etc. Quite often referred to as sharding.
      This sub-pattern is inline with the way in which persistence-based load balancing is accomplished, as well as the handling of object caching. This also describes the way in which you might direct requests received from specific users to designated instances that are specifically designed to handle their unique needs or requirements, such as the separation of “gold” users from “free” users based on some partitioning key which in HTTP land is often a cookie containing the relevant data.

Queuing and batch - Achieve efficiencies of scale by processing batches of data, usually because the overhead of an operation is amortized across multiple request
I admit defeat in applying this sub-pattern to application delivery. I know, you’re surprised, but this really is very specific to middleware and aside from the ability to leverage queuing for Quality of Service (QoS) at the delivery layer this one is just not fitting in well. If you have an idea how this fits, feel free to let me know – I’d love to be able to apply all the scalability patterns and sub-patterns to a broader infrastructure architecture.

Relaxing of data constraints - Many different techniques and trade-offs with regards to the immediacy of processing / storing / access to data fall in this strategy
This one takes us to storage virtualization and tiering and the way in which data storage and access is intelligently handled in varying properties based on usage and prioritization of the content. If one relaxes the constraints around access times for certain types of data, it is possible to achieve a higher efficiency use of storage by subjugating some content to secondary and tertiary tiers which may not have the same performance attributes as your primary storage tier. And make no mistake, storage virtualization is a part of the application delivery network – has been since its inception – and as cloud computing and virtualization have grown so has the importance of a well-defined storage tiering strategy.

We can bring this back up to the application layer by considering that a relaxation of data constraints with regards to immediacy of access can be applied by architecting a solution that separates data reads from writes. This implies eventual consistency, as data updated/written to one database must necessarily be replicated to the databases from which reads are, well, read, but that’s part of relaxing a data constraint. This is a technique used by many large, social sites such as Facebook and Plenty of Fish in order to scale the system to the millions upon millions of requests it handles in any given hour.

Parallelization - Work on the same task in parallel on multiple processing units
I’m not going to be able to apply this one either, unless it was in conjunction with optimizing something like MapReduce and SPDY. I’ve been thinking hard about this one, and the problem is the implication that “same task” is really the “same task”, and that processing is distributed. That said, if the actual task can be performed by multiple processing units, then anapplication delivery controller could certainly be configured to recognize that a specific URL should be essentially sent to some other proxy/solution that performs the actual distribution, but the processing model here deviates sharply from the request-reply paradigm under which most applications today operate.

DEVOPS CAN MAKE THIS HAPPEN

I hate to sound-off too much on the “devops” trumpet, but one of the primary ways in which devops will be of significant value in the future is exactly in this type of practical implementation. Only by recognizing that many architectural patterns are applicable to not only application but infrastructure architecture can we start to apply a whole lot of “lessons that have already been learned” by developers and architects to emerging infrastructure architectural models. This abstraction and application from well-understood patterns in application design and architecture will be invaluable in designing the new network; the next iteration of network theory and implementation that will allow it to scale along with the applications it is delivering.