Showing posts with label web. Show all posts

Monday, 9 December 2013

PayPal Switches from Java to JavaScript

by Abel Avram on Nov 29, 2013 | Discuss
PayPal has decided to use JavaScript from browser all the way to the back-end server for web applications, giving up legacy code written in JSP/Java.
Jeff Harrell, Director of Engineering at PayPal, has explained in a couple of blog posts (Set My UI Free Part 1: Dust JavaScript Templating, Open Source and More , Node.js at PayPal ) why they decided and some conclusions resulting from switching their web application development from Java/JSP to a complete JavaScript/Node.js stack.
According to Harrell, PayPal's websites had accumulated a good deal of technical debt, and they wanted a "technology stack free of this which would enable greater product agility and innovation." Initially, there was a significant divide between front-end engineers working in web technologies and back-end ones coding in Java. When a UX person wanted to sketch up some pages, they had to ask Java programmers to do some back-end wiring to make it work. This did not fit with their Lean UX development model:

At the time, our UI applications were based on Java and JSP using a proprietary solution that was rigid, tightly coupled and hard to move fast in. Our teams didn't find it complimentary to our Lean UX development model and couldn't move fast in it so they would build their prototypes in a scripting language, test them with users, and then later port the code over to our production stack.

They wanted a "templating [solution that] must be decoupled from the underlying server technology and allow us to evolve our UIs independent of the application language" and that would work with multiple environments. They decided to go with Dust.js – a templating framework backed up by LinkedIn – , plus Twitter's Bootstrap and Bower , a package manager for the web. Additional pieces added later were LESS , RequireJS , Backbone.js , Grunt , and Mocha .
Some of PayPal's pages have been redesigned but they still had some of the legacy stack:

… we have legacy C++/XSL and Java/JSP stacks, and we didn't want to leave these UIs behind as we continued to move forward. JavaScript templates are ideal for this. On the C++ stack, we built a library that used V8 to perform Dust renders natively – this was amazingly fast! On the Java side, we integrated Dust using a Spring ViewResolver coupled with Rhino to render the views.

At that time, they also started using Node.js for prototyping new pages, concluding that it was "extremely proficient" and decided to try it in production. For that they also built Kraken.js , a "convention layer" placed on top of Express which is a Node.js-based web framework. (PayPal has recently open sourced Kraken.js.) The first application to be done in Node.js was the account overview page, which is one of the most accessed PayPal pages, according to Harrell. But because they were afraid the app might not scale well, they decided to create an equivalent Java application to fall back to in case the Node.js one won't work. Following are some conclusions regarding the development effort required for both apps:

	Java/Spring	JavaScript/Node.js
Set-up time	0	2 months
Development	~5 months	~3 months
Engineers	5	2
Lines of code	unspecified	66% of unspecified

The JavaScript team needed 2 months for the initial setup of the infrastructure, but they created with fewer people an application with the same functionality in less time. Running the test suite on production hardware, they concluded that the Node.js app was performing better than the Java one, serving:

Double the requests per second vs. the Java application. This is even more interesting because our initial performance results were using a single core for the node.js application compared to five cores in Java. We expect to increase this divide further.

and having

35% decrease in the average response time for the same page. This resulted in the pages being served 200ms faster— something users will definitely notice.

As a result, PayPal began using the Node.js application in beta in production, and have decided that "all of our consumer facing web applications going forward will be built on Node.js," while some of the existing ones are being ported to Node.js.
One of the benefits of using JavaScript from browser to server is, according to Harrell, the elimination of a divide between front and back-end development by having one team "which allows us to understand and react to our users' needs at any level in the technology stack."

Tell us what you think

Sent from Evernote

Tuesday, 5 April 2011

Introduction to Architecting Systems for Scale

Introduction to Architecting Systems for Scale: "
Few computer science or software development programs
attempt to teach the building blocks of scalable systems.
Instead, system architecture is usually picked up on the job by
working through the pain of a growing product
or by working with engineers who have already learned
through that suffering process.

In this post I'll attempt to document some of the
scalability architecture lessons I've learned while working on
systems at Yahoo! and Digg.

I've attempted to maintain a color convention for diagrams in this
post:

green represents an external request from an external
client (an HTTP request from a browser, etc),
blue represents your code running in some container
(a Django app running on mod_wsgi,
a Python script listening to RabbitMQ, etc), and
red represents a piece of infrastructure (MySQL, Redis, RabbitMQ, etc).

Load Balancing: Scalability & Redundancy

The ideal system increases capacity linearly with adding hardware.
In such a system, if you have one machine and add another, your capacity would double.
If you had three and you add another, your capacity would increase by 33%.
Let's call this horizontal scalability.

On the failure side, an ideal system isn't disrupted by the loss of a server.
Losing a server should simply decrease system capacity by the same amount it increased
overall capacity when it was added. Let's call this redundancy.

Both horizontal scalability and redundancy are usually achieved via load balancing.

(This article won't address vertical scalability,
as it is usually an undesirable property for a large system, as there is inevitably a point
where it becomes cheaper to add capacity in the form on additional machines rather than
additional resources of one machine, and redundancy and vertical scaling can be
at odds with one-another.)

Load balancing is the process of spreading requests across multiple resources according
to some metric (random, round-robin, random with weighting for machine capacity, etc)
and their current status (available for requests, not responding, elevated error rate, etc).

Load needs to be balanced between user requests and your web servers,
but must also be balanced at every stage to achieve full scalability and redundancy
for your system. A moderately large system may balance load at three layers: from the

user to your web servers, from your
web servers to an internal platform layer, and from your
internal platform layer to your database.

There are a number of ways to implement load balancing in your setup.

Smart Clients

Adding load-balancing functionality into your database (cache, service, etc) client
is usually an attractive solution for the developer. Is it attractive because it is
the simplest solution? Usually, no. Is it seductive because it is the most robust? Sadly, no.
Is it alluring because it'll be easy to reuse? Tragically, no.

Developers lean towards
smart clients because they are developers, and so they are used to writing software to
solve their problems, and smart clients are software.

With that caveat in mind, what is a smart client? It is a client which takes a pool
of service hosts and balances load across them, detects downed hosts and avoids
sending requests their way (they also have to detect recovered hosts, deal with adding
new hosts, etc, making them fun to get working decently and a terror to get working correctly).

Hardware Load Balancers

The most expensive--but very high performance--solution to load balancing is to buy a dedicated hardware
load balancer (something like a Citrix NetScaler). While they can solve a remarkable range of problems,
hardware solutions are remarkably expensive, and they are also 'non-trivial' to configure.

As such, generally even large companies with substantial budgets will often avoid using dedicated hardware for
all their load-balancing needs; instead they use them only as the first point of contact from user
requests to their infrastructure, and use other mechanisms (smart clients or the hybrid approach discussed
in the next section) for load-balancing for traffic within their network.

Software Load Balancers

If you want to avoid the pain of creating a smart client,
and purchasing dedicated hardware is excessive,
then the universe has been kind enough to provide a hybrid approach: software load-balancers.

HAProxy is a great example of this approach.
It runs locally on each of your boxes, and each service you want to load-balance
has a locally bound port. For example, you might have your platform machines accessible
via localhost:9000, your database read-pool at localhost:9001 and your database
write-pool at localhost:9002. HAProxy manages healthchecks and will remove and return
machines to those pools according to your configuration, as well as balancing across all
the machines in those pools as well.

For most systems, I'd recommend starting with a software load balancer and moving to
smart clients or hardware load balancing only with deliberate need.

Caching

Load balancing helps you scale horizontally across an ever-increasing
number of servers, but caching will enable you to make vastly better
use of the resources you already have, as well as making otherwise
unattainable product requirements feasible.

Caching consists of: precalculating results (e.g. the number of visits from each referring domain for the previous day),
pre-generating expensive indexes (e.g. suggested stories based on a user's click history), and
storing copies of frequently accessed data in a faster backend (e.g. Memcache instead of PostgreSQL.

In practice, caching is important earlier in the development process than load-balancing,
and starting with a consistent caching strategy will save you time later on.
It also ensures you don't optimize
access patterns which can't be replicated with your caching mechanism or access patterns where performance becomes
unimportant after the addition of caching
(I've found that many heavily optimized Cassandra applications
are a challenge to cleanly add caching to if/when the database's caching strategy can't
be applied to your access patterns, as the datamodel is generally inconsistent between the Cassandra
and your cache).

Application Versus Database Caching

There are two primary approaches to caching: application caching and
database caching (most systems rely heavily on both).

Application Cache

Application caching requires explicit integration in the application code itself.
Usually it will check if a value is in the cache; if not, retrieve the value from
the database; then write that value into the cache (this value is especially
common if you are using a cache which observes the least recently used caching algorithm).
The code typically looks like (specifically this is a read-through cache, as it
reads the value from the database into the cache if it is missing from the cache):

key = "user.%s" % user_id
user_blob = memcache.get(key)
if user_blob is None:
user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"", user_id)
if user:
memcache.set(key, json.dumps(user))
return user
else:
return json.loads(user_blob)

The other side of the coin is database caching.

Database Cache

When you flip your database on, you're going to get some level
of default configuration which will provide some degree of caching and performance.
Those initial settings will be optimized for a generic usecase,
and by tweaking them to your system's access patterns you can generally
squeeze a great deal of performance improvement.

The beauty of database caching is that your application code gets faster 'for
free', and a talented DBA or operational engineer can uncover
quite a bit of performance without your code changing a whit
(my colleague Rob Coli spent some time recently optimizing our configuration for Cassandra
row caches, and was succcessful to the extent that he spent a week harassing us with graphs
showing the I/O load dropping dramatically and request latencies
improving substantially as well).

In Memory Caches

The most potent--in terms of raw performance--caches you'll encounter are those which store
their entire set of data in memory. Memcached and
Redis are both examples of in-memory caches (caveat: Redis can be configured
to store some data to disk).
This is because accesses to RAM are orders of magnitude
faster than those to disk.

On the other hand, you'll generally have far less RAM available than disk space, so you'll
need a strategy for only keeping the hot subset of your data in your memory cache. The most
straightforward strategy is least recently used, and is employed by Memcache (and Redis as of 2.2 can
be configured to employ it as well). LRU works by evicting less commonly used data in preference of
more frequently used data, and is almost always an appropriate caching strategy.

Content Distribution Networks

A particular kind of cache (some might argue with this usage of the
term, but I find it fitting) which comes into play for sites serving large amounts
of static media is the content distribution network.

CDNs take the burden of serving static media off of your application servers (which are
typically optimzed for serving dynamic pages rather than static media), and
provide geographic distribution. Overall, your static assets will load more quickly
and with less strain on your servers (but a new strain of business expense).

In a typical CDN setup, a request will first ask your CDN for a piece of static media, the
CDN will serve that content if it has it locally available (HTTP headers are used for configuring
how the CDN caches a given piece of content). If it isn't available, the CDN will query your
servers for the file and then cache it locally and serve it to the requesting user (in this
configuration they are acting as a read-through cache).

If your site isn't yet large enough to merit its own CDN, you can ease a future transition
by serving your static media off a separate subdomain (e.g. static.example.com) using
a lightweight HTTP server like Nginx, and cutover the DNS from
your servers to a CDN at a later date.

Cache Invalidation

While caching is fantastic, it does require you to maintain consistency between your
caches and the source of truth (i.e. your database), at risk of truly bizarre applicaiton behavior.

Solving this problem is known as cache invalidation.

If you're dealing with a single datacenter, it tends to be a straightforward
problem, but it's easy to introduce errors if you have multiple codepaths writing to your
database and cache (which is almost always going to happen if you don't go into writing the application
with a caching strategy already in mind). At a high level, the solution is: each time a value changes,
write the new value into the cache (this is called a write-through cache)
or simply delete the current value from the cache and allow a read-through cache to populate it later
(choosing between read and write through caches depends on your application's details,
but generally I prefer write-through caches as they reduce likelihood of a stampede
on your backend database).

Invalidation becomes meaningfully more challenging for scenarios involving fuzzy queries (e.g if you
are trying to add application level caching in-front of a full-text search engine like
SOLR), or modifications to unknown number of elements
(e.g. deleting all objects created more than a week ago).

In those scenarios you have to consider relying fully on database caching, adding aggressive
expirations to the cached data, or reworking your application's logic to avoid the issue
(e.g. instead of DELETE FROM a WHERE..., retrieve all the items which match the criteria,
invalidate the corresponding cache rows and then delete the rows by their primary key explicitly).

Off-Line Processing

As a system grows more complex, it is almost always necessary to perform processing
which can't be performed in-line with a client's request either because it is creates
unacceptable latency (e.g. you want to want to propagate a user's action across a
social graph) or it because it needs to occur periodically (e.g. want to create daily rollups
of analytics).

Message Queues

For processing you'd like to perform inline with a request but is too slow,
the easiest solution is to create a message queue (for example, RabbitMQ).
Message queues allow your web applications to quickly publish messages to the queue,
and have other consumers processes perform the processing outside the scope and timeline
of the client request.

Dividing work between off-line work handled by a consumer and in-line work done by
the web application depends entirely on the interface you are exposing to your users.
Generally you'll either:

perform almost no work in the consumer (merely scheduling a task)
and inform your user that the task will occur offline, usually with a polling mechanism
to update the interface once the task is complete
(for example, provisioning a new VM on Slicehost follows this pattern), or
perform enough work in-line to make it appear to the user that the task has completed,
and tie up hanging ends afterwards (posting a message on Twitter or Facebook likely
follow this pattern by updating the tweet/message in your timeline but updating
your followers' timelines out of band; it's simple isn't feasible to update all
the followers for a Scobleizer
in real-time).

Message queues have another benefit, which is that they allow you
to create a separate machine pool for performing off-line processing
rather than burdening your web application servers. This allows you
to target increases in resources to your current performance or throughput bottleneck
rather than uniformly increasing resources across the bottleneck and non-bottleneck
systems.

Scheduling Periodic Tasks

Almost all large systems require daily or hourly tasks,
but unfortunately this seems to still be a problem waiting
for a widely accepted solution which easily supports redundancy.
In the meantime you're probably still stuck with cron,
but you could use the cronjobs to publish messages to a consumer,
which would mean that the cron machine is only responsible for scheduling
rather than needing to perform all the processing.

Does anyone know of recognized tools which solve this problem?
I've seen many homebrew systems, but nothing clean and reusable.
Sure, you can store the cronjobs in a Puppet
config for a machine, which makes recovering from losing that machine
easy, but it would still require a manual recovery, which is probably
acceptable but not quite perfect.

Map-Reduce

If your large scale application is dealing with a large quantity of data,
at some point you're likely to add support for map-reduce,
probably using Hadoop, and maybe
Hive or HBase.

Map Reduce

Adding a map-reduce layer makes it possible to perform data and/or processing
intensive operations in a reasonable amount of time. You might use it for calculating
suggested users in a social graph, or for generating analytics reports.

For sufficiently small systems you can often get away with adhoc queries
on a SQL database, but that approach may not scale up trivially once the
quantity of data stored or write-load requires sharding your database,
and will usually require dedicated slaves for the purpose of performing
these queries (at which point, maybe you'd rather use a system designed
for analyzing large quantities of data, rather than fighting your database).

Platform Layer

Most applications start out with a web application communicating
directly with a database. This approach tends to be sufficient
for most applications, but there are some compelling reasons
for adding a platform layer, such that your web applications communicate
with a platform layer which in turn communicates with your databases.

First, separating the platform and web application allow you to scale
the pieces independently. If you add a new API, you can add platform servers
without adding unnecessary capacity for your web application tier.
(Generally, specializing your servers' role opens up an additional level
of configuration optimization which isn't available for general purpose
machines; your database machine will usually have a high I/O load and will
benefit from a solid-state drive, but your well-configured application server probably
isn't reading from disk at all during normal operation, but might benefit from
more CPU.)

Second, adding a platform layer can be a way to reuse your infrastructure for multiple
products or interfaces (a web application, an API, an iPhone app, etc) without writing
too much redundant boilerplate code for dealing with caches, databases, etc.

Third, a sometimes underappreciated aspect of platform layers is that they
make it easier to scale an organization. At their best, a platform exposes a crisp product-agnostic
interface which masks implementation details.
If done well, this allows multiple independent teams to develop utilizing the platform's
capabilities, as well as another team implementing/optimizing the platform itself.

I had intended to go into moderate detail on handling multiple data-centers, but
that topic truly deserves its own post, so I'll only mention that cache
invalidation and data replication/consistency become rather interesting problems
at that stage.

I'm sure I've made some controversial statements in this post,
which I hope the dear reader will argue with such that we can
both learn a bit. Thanks for reading!
"

Wednesday, 26 January 2011

InfoQ: Asynchronous, Event-Driven Web Servers for the JVM: Deft and Loft

Asynchronous, event-driven architectures have been gaining a lot of attention lately, mostly with respect toJavaScript and Node.js. Deft and Loft are two solutions that bring "asynchronous purity" to the JVM.

RelatedVendorContent

Domain-Specific Languages for Functional Testing

Future of Java Track - EE, Spring, JVMs+ @QConLondon

Agile Maturity Model Applied to Building and Releasing Software

WebSphere Application Server V7.0 Security Redbook

JBoss versus IBM WebSphere: Cost, Performance, Efficiency, Innovation (IBM wins)

Related Sponsor

Dynamic Application Infrastructure delivers the innovation, performance and scalability to build, deploy and manage all types of highly robust applications.

InfoQ had an interview with Roger Schildmeijer, one of the two creators, about these two non blocking web server frameworks:

InfoQ: What is Deft and Loft?

Roger: So before I start to describe what Deft and Loft is I would like to start from the beginning. September 2009 Facebook open sourced a piece of software called Tornado which was a relatively simple, non-blocking Web server framework written in Python, designed to handle thousands of simultaneous connection. Tornado gained a lot of attraction pretty quickly and became quite popular because of its strength and simplistic design. At this time a lot of developers out there became aware of the "c10k problem" (from Wikipedia: The C10k problem is the numeronym given to a limitation that most web servers currently have which limits the web server's capabilities to only handle about ten thousand simultaneous connections.)
In the late summer of 2010 Jim Petersson and I started started to discuss and design an asynchronous non-blocking web server/framework running on the JVM using pure Java NIO. I would say that the main reason for this initiative was our curiosity about the potential speed improvements that could be achieved with a system similar to Tornado but written in Java. We knew that we could never create an API as clean and simple as Tornado's.
(Clean APIs have never been the advantage of Java, if you ask me)
We got something up and running within the first 48h and saw some extraordinary (very non-scientific) benchmark results. This was the feedback we aimed for and Deft was born.
Just to clarify, I would be the last person to suggest that someone should throw away their current system using Tornado (or some other asynchronous non-blocking web server) and replace it with Deft. Deft is still pretty young and has a lot to learn (there are a lot of issues to be implemented). By the time this interview is published I hope that the next stable Deft release, 0.2.0, will be ready and released.
After a couple of weeks of Deft hacking we started to discuss how Deft would look like if we had used another language, like Scala. It would be very gratifying to create something with nice performance but also a system that had a clean and elegant syntax. This was the seed for another project, and Loft was born. To make another important clarification: Loft is still very much in its infancy and there is yet no stable release available.
The main features that set Loft apart from Deft are:
it's written in Scala

it uses a feature in Scala called continuations. The reason for this is to make asynchronous programming easier. (Will explain that in detail below)

the initial version will be a pure Tornado clone

InfoQ: Would you like to explain to us the motivation behind those architectural decisions?

Roger: We wanted to create something similar to Tornado that runs on the Java Virtual Machine. There are a lot of good (multi-threaded) open source web servers and web frameworks (apache tomcat and jetty are two popular examples) out there already, and we didn't want to compete with those beasts. Instead we wanted to build something that was good at handling thousands of simultaneous connections. The single threaded approach was already tested (and proved successful) by frameworks like Tornado.

InfoQ: What is a typical development workflow in Deft - from coding and debugging, up to deploying and monitoring?

Roger: This is a very good question and something that is very important to address. However, because Deft is so new and its user base is currently small, I'm afraid I can't give an example of a typical development workflow. That is something we hope a growing community will help flesh out..
The biggest difference from the coding that most Java developers do is that everything executes inside a single threaded environment. The benefits of coding that way are that you don't have to use explicit locks and you don't have to think about deadlocks because of inadequate synchronized code. The downside is that you are not allowed to do blocking operations. A blocking operation will stall the entire server and make it unresponsive
Deft 0.2.0 will contain an experimental JMX API used for monitoring. Some examples of things that could be monitored are the number of pending timeouts/keep-alive timeouts, number of registered IO handlers in the event loop and the select timeout of the selector.

InfoQ: Are there any benchmarks that evaluate Deft's performance?

Roger: We have made some (non-scientific) benchmarks against a simple web server using a single request handler that responds with a 200 OK HTTP status code and writes "hello world" to the client (The code for the benchmarks is also available on github.com/rschildmeijer). Against a simple "hello-world server" we have seen speed improvements by a factor of 8-10x (compared to Tornado). The entire results from the benchmarks are available on http://deftserver.org/.

InfoQ: How does Deft compare with other solutions like Tornado, Node.js or Netty?

Roger: Tornado and Node.js are the two other asynchronous web servers that we used in the benchmark. We didn't include Netty because it felt a little bit like comparing apples with oranges. But I wouldn't doubt if Netty showed numbers equal to (or greater?) the results we have seen for Deft. Netty, the successor to apache mina, is a really cool socket framework written by a really smart guy (Trustin Lee).

InfoQ: What was the motivation for Loft and how does it use continuations?

Roger: So finally time to show some code! (I would like to (once again) remind you that Loft is pretty much in the starting blocks and the code snippets for Loft are the proposed design)
A simple request handler for Loft could look something like this:
@Asynchronous   def get() {     val http = AsyncHTTPClient()     reset {       val id = database get("roger_schildmeijer");   //async call       val result = http fetch("http://127.0.0.1:8080/" + id); //async call       write(result)       finish      }   }    val application = Application(Map("/".r -> this))    def main(args: Array[String]) {     val httpServer = HTTPServer(application)     httpServer listen(8888)     IOLoop start   } } 
The main method contains the canonical way to start a Loft instance. The ExampleHandler.get is where things start to get interesting. Inside the method two asynchronous calls are made. Asynchronous programming is often conducted by supplying a callback as an additional parameter, and that callback will be called when the result is ready. And if you have two or more consecutive asynchronous calls, you (usually) will have to chain these calls together.
E.g.:
database get("roger_schildmeijer", (id) => http fetch("http://127.0.0.1:8080/" + id, (result) => write(result))); 
So what is actually going in the ExampleHandler.get method above?
You might have noticed the "reset" word in method, this indicates that some Scala continuations are about to happen. Continuations, as a concept, are hard to grasp. But when the magic disappears, continuations are just functions representing another point in the program (It contains information such as the process's current stack). If you call the continuation, it will cause execution to automatically switch to the point that function represents. (Actually you use restricted versions of them every time you do exception handling).
Just so I don't confuse anyone. The two asynchronous methods "get" and "fetch" must be implemented in a certain way in order for this example to work.

InfoQ: The asynchronous, event-driven paradigm has gained lots of attention lately. Why do you think is that and how do you see this trend evolving in the future?

One reason that systems like Tornado, node.js and Netty have received a lot of attention in recent years is because of big social networks that need a huge number of idle connections.
As long as you and I use services like Facebook and twitter I think the need for systems like Deft, Loft and Tornado will exist.
As a final note I would like to add that we are looking for contributors that are interested in supporting Deft (current status: two committers, two contributors) and/or Loft (two committers).

Friday, 3 December 2010

The Full Stack, Part I

The Full Stack, Part I: "

One of my most vivid memories from school was the day our chemistry teacher let us in on the Big Secret: every chemical reaction is a joining or separating of links between atoms. Which links form or break is completely governed by the energy involved and the number of electrons each atom has. The principle stuck with me long after I'd forgotten the details. There existed a simple reason for all of the strange rules of chemistry, and that reason lived at a lower level of reality. Maybe other things in the world were like that too.

xkcd.com/435

A 'full-stack programmer" is a generalist, someone who can create a non-trivial application by themselves. People who develop broad skills also tend to develop a good mental model of how different layers of a system behave. This turns out to be especially valuable for performance & optimization work. No one can know everything about everything, but you should be able to visualize what happens up and down the stack as an application does its thing. An application is shaped by the requirements of its data, and performance is shaped by how quickly hardware can throw data around.

Consider this harmless-looking SQL query:

DELETE FROM some_table WHERE id = 1234;

If the id column is not indexed, this code will usually result in a table scan: all of the records in some_table will be examined one-by-one to see if id equals 1234. Let's assume id is the indexed primary key. That's a good as it gets, right? Well, if the table is in InnoDB format it will result in one disk-seek, because the data is stored next to the primary key and can be deleted in one operation. If the table is MyISAM it will result in at least two seeks, because indexes and data are stored in different files. A hard drive can only do one seek at a time, so this detail can make the difference between 1X or 2X transactions per second. Digging deeper into how these storage engines work, you can find ways to trade safety for even more speed.

The shape of the data

One way to visualize a system is how its data is shaped and how it flows. Here are a some useful factors to think about:

Working data size: This is the amount of data a system has to deal with during normal operation. Often it is identical to the total data size minus things like old logs, backups, inactive accounts, etc. In time-based applications such as email or a news feed the working set can be much smaller than the total set. People rarely access messages more than a few weeks old.

Average request size: How much data does one user transaction have to send over the network? How much data does the system have to touch in order to serve that request? A site with 1 million small pictures will behave differently from a site with 1,000 huge files, even if they have the same data size and number of users. Downloading a photo and running a web search involve similar-sized answers, but the amounts of data touched are very different.

Request rate: How many transactions are expected per user per minute? How many concurrent users are there at peak (your busiest period)? In a search engine you may have 5 to 10 queries per user session. An online ebook reader might see constant but low volumes of traffic. A game may require multiple transactions per second per user.

Mutation rate: This is a measure of how often data is added, deleted, and edited. A webmail system has a high add rate, a lower deletion rate, and an almost-zero edit rate. An auction system has ridiculously high rates for all three.

Consistency: How quickly does a mutation have to spread through the system? For a keyword advertising bid, a few minutes might be acceptable. Trading systems have to reconcile in milliseconds. A comments system is generally expected to show new comments within a second or two.

Locality: This has to do with the probability that a user will read item B if they read item A. Or to put it another way, what portion of the working set does one user session need access to? On one extreme you have search engines. A user might want to query bits from anywhere in the data set. In an email application, the user is guaranteed to only access their inbox. Knowing that a user session is restricted to a well-defined subset of the data allows you to shard it: users from India can be directed to servers in India.

Computation: what kinds of math do you need to run on the data before it goes out? Can it be precomputed and cached? Are you doing intersections of large arrays? The classic flight search problem requires lots of computation over lots of data. A blog does not.

Latency: How quickly are transactions supposed to return success or failure? Users seem to be ok with a flight search or a credit card transaction taking their time. A web search has to return within a few hundred milliseconds. A widget or API that outside systems depend on should return in 100 milliseconds or less. More important is to maintain application latency within a narrow band. It is worse to answer 90% of queries in 0.1 seconds and the rest in 2 seconds, rather than all requests in 0.2 seconds.

Contention: What are the fundamental bottlenecks? A pizza shop's fundamental bottleneck is the size of its oven. An application that serves random numbers will be limited by how many random-number generators it can employ. An application with strict consistency requirements and a high mutation rate might be limited by lock contention. Needless to say, the more parallelizability and the less contention, the better.

This model can be applied to a system as a whole or to a particular feature like a search page or home page. It's rare that all of the factors stand out for a particular application; usually it's 2 or 3. A good example is ReCAPTCHA. It generates a random pair of images, presents them to the user, and verifies whether the user spelled the words in the images correctly. The working set of data is small enough to fit in RAM, there is minimal computation, a low mutation rate, low per-user request rate, great locality, but very strict latency requirements. I'm told that ReCAPTCHA's request latency (minus network latency) is less than a millisecond.

A horribly oversimplified model of computation

aturingmachine.com

How an application is implemented depends on how real computers handle data. A computer really does only two things: read data and write data. Now that CPU cycles are so fast and cheap, performance is a function of how fast it can read or write, and how much data it must move around to accomplish a given task. For historical reasons we draw a line at operations over data on the CPU or in memory and call that 'CPU time'. Operations that deal with storage or network are lumped under 'I/O wait'. This is terrible because it doesn't distinguish between a CPU that's doing a lot of work, and a CPU that's waiting for data to be fetched into its cache.[0] A modern server works with five kinds of input/output, each one slower but with more capacity than the next:

Registers & CPU cache (1 nanosecond): These are small, expensive and very fast memory slots. Memory controllers try mightily to keep this space populated with the data the CPU needs. A cache miss means a 100X speed penalty. Even with a 95% hit rate, CPU cache misses waste half the time.

Main memory (10^2 nanoseconds): If your computer was an office, RAM would be the desk scattered with manuals and scraps of paper. The kernel is there, reserving Papal land-grant-sized chunks of memory for its own mysterious purposes. So are the programs that are either running or waiting to run, network packets you are receiving, data the kernel thinks it's going to need, and (if you want your program to run fast) your working set. RAM is hundreds of times slower than a register but still orders of magnitude faster than anything else. That's why server people go to such lengths to jam more and more RAM in.

Solid-state drive (10^5 nanoseconds): SSDs can greatly improve the performance of systems with working sets too large to fit into main memory. Being 'only' one thousand times slower than RAM, solid-state devices can be used as ersatz memory. It will take a few more years for SSDs to replace magnetic disks. And then we'll have to rewrite software tuned for the RAM / magnetic gap and not for the new reality.

Magnetic disk (10^7 nanoseconds): Magnetic storage can handle large, contiguous streams of data very well. Random disk access is what kills performance. The latency gap between RAM and magnetic disks is so great that it's hard to overstate its importance. It's like the difference between having a dollar in your wallet and having your mom send you a dollar in the mail. The other important fact is that access time varies wildly. You can get at any part of RAM or SSD in about the same time, but a hard disk has a physical metal arm that swings around to reach the right part of the magnetic platter.

Network (10^6 to 10^9 nanoseconds): Other computers. Unless you control that computer too, and it's less than a hundred feet away, network calls should be a last resort.

Trust, but verify

The software stack your application runs on is well aware of the memory/disk speed gap, and does its best to juggle things around such that the most-used data stays in RAM. Unfortunately, different layers of the stack can disagree about how best to do that, and often fight each other pointlessly. My advice is to trust the kernel and keep things simple. If you must trust something else, trust the database and tell the kernel to get out of the way.

Thumbs and envelopes

I'm using approximate powers-of-ten here to make the mental arithmetic easier. The actual numbers are less neat. When dealing with very large or very small numbers it's important to get the number of zeros right quickly, and only then sweat the details. Precise, unwieldy numbers usually don't help in the early stages of analysis. [1]

Suppose you have ten million (10^7) users, each with 10MB (10^7) bytes of data, and your network uplink can handle 100 megabits (10^7 bytes) per second. How long will it take to copy that data to another location over the internet? Hmm, that would be 10^7 seconds, or about 4 months: not great, but close to reasonable. You could use compression and multiple uplinks to bring the transfer time down to, say, a week. If the approximate answer had been not 4 but 400 months, you'd quickly drop the copy-over-the-internet idea and look for another answer.

movies.example.com

So can we use this model to identify the performance gotchas of an application? Let's say we want to build a movies-on-demand service like Netflix or Hulu. Videos are professionally produced and 20 and 200 minutes long. You want to support a library of 100,000 (10^5) films and 10^5 concurrent users. For simplicity's sake we'll consider only the actual watching of movies and disregard browsing the website, video encoding, user comments & ratings, logs analysis, etc.

Working data size: The average video is 40 minutes long, and the bitrate is 300kbps. 40 * 60 * 300,000 / 8 is about 10^8 bytes. Times 10^5 videos means that your total working set is 10^13 bytes, or 10TB.

Average request size: A video stream session will transfer somewhere between 10^7 and 10^9 bytes. In Part One we won't be discussing networking issues, but if we were this would be cause for alarm.

Request rate: Fairly low, though the concurrent requests will be high. Users should have short bursts of browsing and long periods of streaming.

Mutation rate: Nearly nil.

Consistency: Unimportant except for user data. It would be nice to keep track of what place they were in a movie and zip back to that, but that can be handled lazily (eg in a client-side cookie).

Locality: Any user can view any movie. You will have the opposite problem of many users accessing the same movie.

Computation: If you do it right, computation should be minimal. DRM or on-the-fly encoding might eat up cycles.

Latency: This is an interesting one. The worst case is channel surfing. In real-world movie services you may have noticed that switching streams or skipping around within one video takes a second or two in the average case. That's at the edge of user acceptability.

Contention: How many CPU threads do you need to serve 100,000 video streams? How much data can one server push out? Why do real-world services seem to have this large skipping delay? When multiple highly successful implementations seem to have the same limitation, that's a strong sign of a fundamental bottleneck.

It's possible to build a single server that holds 10TB of data, but what about throughput? A hundred thousand streams at 300kbps (10^5 * 3 * 10^5) is 30 gigabits per second (3 * 10^10). Let's say that one server can push out 500mbps in the happy case. You'll need at least 60 servers to support 30gbps. That implies about 2,000 concurrent streams per server, which sounds almost reasonable. These guesses may be off by a factor or 2 or 4 but we're in the ballpark.

You could store a copy of the entire 10TB library on each server, but that's kind of expensive. You probably want either:

A set of origin servers and a set of streaming servers. The origins are loaded with disks. The streamers are loaded with RAM. When a request comes in for a video, the streamer first checks to see if it has a local cache. If not, it contacts the origins and reads it from there.
A system where each video is copied to only a few servers and requests are routed to them. This might have problems with unbalanced traffic.

An important detail is the distribution of popularity of your video data. If everyone watches the same 2GB video, you could just load the whole file into the RAM of each video server. On the other extreme, if 100,000 users each view 100,000 different videos, you'd need a lot of independent spindles or SSDs to keep up with the concurrent reads. In practice, your traffic will probably follow some kind of power-law distribution in which the most popular video has X users, the second-most has 0.5X users, the third-most 0.33X users, and so on. On one hand that's good; the bulk of your throughput will be served hot from RAM. On the other hand that's bad, because the rest of the requests will be served from cold storage.

Whatever architecture you use, it looks as though the performance of movies.example.com will depend almost completely on the random seek time of your storage devices. If I were building this today I would give both SSDs and non-standard data prefetching strategies a serious look.

It's been fun

This subject is way too large for a short writeup to do it justice. But absurd simplifications can be useful as long as you have an understanding of the big picture: an application's requirements are shaped by the data, and implementations are shaped by the hardware's ability to move data. Underneath every simple abstraction is a world of details and cleverness. The purpose of the big fuzzy picture is to point you where to start digging.

Carlos Bueno, an engineer at Facebook, thinks it's turtles all the way down.

Notes

[*] This article is part of Perf Planet's 2010 Performance Calendar.

[0] Fortunately there is a newish tool for Linux called 'perf counters'.

[1] Jeff Dean of Google deserves a lot of credit for popularizing the 'numbers you should know' approach to performance and systems work. As my colleague Keith Adams put it, 'The ability to quickly discard bad solutions, without actually building them, is a lot of what good systems programming is. Some of that is instinct, some experience, but a lot of it is algebra.'

Tuesday, 16 November 2010

Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests

Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests: "

Low Cost, High Performance, Strong Security: Pick Any Three by Chris Palmer has a funny and informative presentation where the main message is: reduce the size and frequency of network communications, which will make your pages load faster, which will improve performance enough that you can use HTTPS all the time, which will make you safe and secure on-line, which is a good thing.

The benefits of HTTPS for security are overwhelming, but people are afraid of the performance hit. The argument is successfully made that the overhead of HTTPS is low enough that you can afford the cost if you do some basic optimization. Reducing the number of HTTP requests is a good source of low hanging fruit.

From the Yahoo UI Blog:

Reducing the number of HTTP requests has the biggest impact on reducing response time and is often the easiest performance improvement to make.

Thursday, 26 August 2010

Designing a Web Application with Scalability in Mind

Designing a Web Application with Scalability in Mind: "Max Indelicato, a Software Development Director and former Chief Software Architect, has written a post on how to design a web application for scalability. He suggests choosing the right deploying and storage solution, a scalable data storage and schema, and using abstraction layers. By Abel Avram"

Unlimited-Data. moved to lab.itbee.vn

Monday, 9 December 2013

PayPal Switches from Java to JavaScript

PayPal Switches from Java to JavaScript

Tell us what you think

Tuesday, 5 April 2011

Introduction to Architecting Systems for Scale

Load Balancing: Scalability & Redundancy

Smart Clients

Hardware Load Balancers

Software Load Balancers

Caching

Application Versus Database Caching

In Memory Caches

Content Distribution Networks

Cache Invalidation

Off-Line Processing

Message Queues

Scheduling Periodic Tasks

Map-Reduce

Platform Layer

Wednesday, 26 January 2011

InfoQ: Asynchronous, Event-Driven Web Servers for the JVM: Deft and Loft

RelatedVendorContent

Related Sponsor

Friday, 3 December 2010

The Full Stack, Part I

The shape of the data

A horribly oversimplified model of computation

Trust, but verify

Thumbs and envelopes

movies.example.com

It's been fun

Notes

Tuesday, 16 November 2010

Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests

Thursday, 26 August 2010

Designing a Web Application with Scalability in Mind

Labels