Friday 30 September 2011

Presentation: HBase @ Facebook

Presentation: HBase @ Facebook: Kannan Muthukkaruppan overviews HBase, explaining what Facebook Messages is and why they chose HBase to implement it, their contribution to HBase, and what they plan to use it for in the future. By Kannan Muthukkaruppan

Thursday 22 September 2011

Presentation: Building Scalable Systems: an Asynchronous Approach

Presentation: Building Scalable Systems: an Asynchronous Approach: Theo Schlossnagle expresses his opinion on Big Data, NoSQL, cloud, system architecture and design, then he discusses the benefit of using asynchronous queues for building scalable systems. By Theo Schlossnagle

Thursday 15 September 2011

Programming != Computer Science

Programming != Computer Science: I recently read this very interesting article on ways to "level up" as a software developer. Reading this article brought home something that has been nagging me for a while since joining Google: that there is a huge skill and cultural gap between "developers" and "Computer Scientists." Jason's advice to leveling-up in the aforementioned article is very practical: write code in assembly, write a mobile app, complete the exercises in SICP, that sort of thing. This is good advice, but certainly not all that I would want people on my team spending their time doing in order to be true technical leaders. Whether you can sling JavaScript all day or know the ins and outs of C++ templates often has little bearing on whether you're able to grasp the bigger, more abstract, less well-defined problems and be able to make headway on them.

For that you need a very different set of skills, which is where I start to draw the line between a Computer Scientist and a developer. Personally, I consider myself a Computer Scientist first and a software engineer second. I am probably not the right guy to crank out thousands of lines of Java on a tight deadline, and I'll be damned if I fully grok C++'s inheritance rules. But this isn't what Google hired me to do (I hope!) and I lean heavily on some amazing programmers who do understand these things better than I do.

Note that I am not defining a Computer Scientist as someone with a PhD -- although it helps. Doing a PhD trains you to think critically, to study the literature, make effective use of experimental design, and to identify unsolved problems. By no means do you need a PhD to do these things (and not everyone with a PhD can do them, either).

A few observations on the difference between Computer Scientists and Programmers...

Think Big vs. Get 'er Done

One thing that drove me a little nuts when I first started at Google was how quickly things move, and how often solutions are put into place that are necessary to move ahead, even if they aren't fully general or completely thought through. Coming from an academic background I am used to spending years pounding away at a single problem until you have a single, beautiful, general solution that can stand up to a tremendous amount of scrutiny (mostly in the peer review process). Not so in industry -- we gotta move fast, so often it's necessary to solve a problem well enough to get onto the next thing. Some of my colleagues at Google have no doubt been driven batty by my insistence on getting something "right" when they would rather just (and in fact need to) plow ahead.

Another aspect of this is that programmers are often satisfied with something that solves a concrete, well-defined problem and passes the unit tests. What they sometimes don't ask is "what can my approach not do?" They don't always do a thorough job at measurement and analysis: they test something, it seems to work on a few cases, they're terribly busy, so they go ahead and check it in and get onto the next thing. In academia we can spend months doing performance evaluation just to get some pretty graphs that show that a given technical approach works well in a broad range of cases.

Throwaway prototype vs. robust solution

On the other hand, one thing that Computer Scientists are not often good at is developing production-quality code. I know I am still working at it. The joke is that most academics write code so flimsy that it collapses into a pile of bits as soon as the paper deadline passes. Developing code that is truly robust, scales well, is easy to maintain, well-documented, well-tested, and uses all of the accepted best practices is not something academics are trained to do. I enjoy working with hardcore software engineers at Google who have no problem pointing out the totally obvious mistakes in my own code, or suggesting a cleaner, more elegant approach to some ass-backwards page of code I submitted for review. So there is a lot that Computer Scientists can learn about writing "real" software rather than prototypes.

My team at Google has a good mix of folks from both development and research backgrounds, and I think that's essential to striking the right balance between rapid, efficient software development and pushing the envelope of what is possible.

Tuesday 6 September 2011

OpenStack turns 1. What’s next? — Cloud Computing News

OpenStack turns 1. What’s next? — Cloud Computing News:



OpenStack, the open-source, cloud-computing software project founded by Rackspace and NASA, celebrates its first birthday tomorrow. It has been a busy year for the project, which appears to have grown much faster than even its founders expected it would. A year in, OpenStack is still picking up steam and looks not only like an open source alternative to Amazon Web Services and VMware vCloud in the public Infrastructure as a Service space, but also a democratizing force in the private-cloud software space.
Let’s take a look at what happened in the past year — at least what we covered — and what to expect in the year to come.

OpenStack year one

July
October
January
February
  • Feb. 3: OpenStackreleases “Bexar” codeand new corporate contributors, including Cisco.
  • Feb. 10: Rackspacebuys Anso Labs, a software contractor that wrote Nova, the foundation of OpenStack Compute, for NASA’s Nebula cloud project.
March
April
May
July
Although the new code, contributors and ecosystem players came fast and furious, OpenStack wasn’t without some controversy regarding the open-source practices it employs. Some contributors were concerned with the amount of control that Rackspace maintainsover the project, which led to the changes in the voting and board-selection process. Still, momentum was overwhelmingly positive, with even the federal government supposedly looking seriously at OpenStack as a means to achieving one of its primary goals of cloud interoperability.

What’s next

According to OpenStack project leader Jonathan Bryce, the next year for OpenStack likely will be defined by the creation of a large ecosystem. This means more software vendors selling OpenStack-based products — he said Piston is only the first-announced startup to get funding — as well as implementations. Aside from public clouds built on OpenStack, Bryce also thinks there will be dozens of publicly announced private clouds build atop the OpenStack code. Ultimately, it’s a self-sustaining cycle: More users means more software and services, which mean more users.
There’s going to be competition, he said, but that’s a good thing for the market because everyone will be pushing to make OpenStack itself better. The more appealing the OpenStack source code looks, the more potential business for Rackspace, Citrix, Piston, Dell, Internap and whoever else emerges as a commercial OpenStack entity.
If this comes to fruition, it’ll be a fun year to cover cloud computing and watch whether OpenStack can actually succeed on its lofty goals of upending what has been, up until now, a very proprietary marketplace.
Image courtesy of Flickr user chimothy27.
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

High Scalability - High Scalability - The Three Ages of Google - Batch, Warehouse, Instant

High Scalability - High Scalability - The Three Ages of Google - Batch, Warehouse, Instant:

The world has changed. And some things that should not have been forgotten, were lost. I found these words from the Lord of the Rings echoing in my head as I listened to a fascinating presentation by Luiz André Barroso, Distinguished Engineer at Google, concerning Google's legendary past, golden present, and apocryphal future. His talk, Warehouse-Scale Computing: Entering the Teenage Decade, was given at the Federated Computing Research Conference. Luiz clearly knows his stuff and was early at Google, so he has a deep and penetrating perspective on the technology. There's much to learn from, think about, and build.
Lord of the Rings applies at two levels. At the change level, Middle Earth went through three ages. While listening to Luiz talk, it seems so has Google: Batch (indexes calculated every month), Warehouse (the datacenter is the computer), and Instant (make it all real-time). At the "what was forgot" level, in the Instant Age section of the talk, a common theme was the challenge of making low latency systems on top of commodity systems. These are issues very common in the real-time area and it struck me that these were the things that should not have been forgotten.
What is completely new, however, is the combining of Warehouse + Instant, and that's where the opportunities and the future is to be found- the Fourth Age.

The First Age - The Age Of Batch

The time is 2003. The web is still young and HTML is still page oriented. Ajax has been invented, but is still awaiting early killer apps like Google Maps and a killer marketing strategy, a catchy brand name like Ajax.
Google is batch oriented. They crawled the web every month (every month!), built a search index, and answered queries. Google was largely read-only, which is pretty easy to scale. This is still probably the model most people have in their minds eye about how Google works.
Google was still unsophisticated in their hardware. They built racks in colo spaces, bought fans from Walmart and cable trays from Home Depot.
It's quaint to think that all of Google's hardware and software architecture could be described in seven pages: Web Search for a Planet: The Google Cluster Architecture by Luiz Barroso, Jeffrey Dean, and Urs Hoelzle. That would quickly change.

The Second Age - The Age Of The Warehouse

The time is 2005. Things move fast on the Internet. The Internet has happened, it has become pervasive, higher speed, and interactive. Google is building their own datacenters and becoming more sophisticated at every level. Iconic systems like BigTable are in production.
About this time Google realized they were building something qualitatively different than had come before, something we now think of, more or less, as cloud computing. Amazon's EC2 launched in 2006. Another paper, this one is really a book, summarizes what they were trying to do: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines by Luiz André Barroso and Urs Hölzle. Note the jump from 7 pages to book size and note that it was published in 2009, 4 years after they were implementing the vision. To learn what Google is really up to now will probably take an encyclopedia and come out in a few years, after they are on to the next thing.
The fundamental insight in this age is that the datacenter is the computer. You may recall that in the late 1980s Sun's John Gage hailed "the network is the computer." The differences are interesting to ponder. When the network was the computer we created client-server architectures that appeared to the outside world as a single application, but in reality they were made of individual nodes connected by a network. Wharehouse-scale Computing (WSC) moves up stack, it considers computer resources, to be as much as possible, fungible, that is they are interchangeable and location independent. Individual computers lose identity and become just a part of a service. Sun later had their own grid network, but I don't think they ever had this full on WSC vision.
Warehouse scale machines are different . They are not made up of separate computers. Applications are not designed to run single machines, but to run Internet services on a datacenter full of machines. What matters is the aggregate performance of the entire system.
The WSC club is not a big one. Luiz says you might have warehouse scale computer if you get paged in the middle of the night because you only have petabytes of data of storage left. With cloud computing

The Third Age - The Age Of Instant

The time is now. There's no encyclopedia yet on how the Age of Instant works because it is still being developed. But because Google is quite open, we do get clues: Google's Colossus Makes Search Real-Time By Dumping MapReduce; Large-Scale Incremental Processing Using Distributed Transactions And Notifications; Tree Distribution Of Requests And Responses; Google Megastore - 3 Billion Writes and 20 Billion Read Transactions Daily; and so much more I didn't cover or only referenced.
Google's Instant Search Results is a crude example Luiz says of what the future will hold. This is the feature that when you type in a letter in the search box you instantly get back query results. This means for every query 5 or 6 queries are executed. You can imagine the infrastructure this must take.
The flip side of search is content indexing. The month long indexing runs are long gone. The Internet is now a giant event monster feeding Google with new content to index continuously and immediately. It is astonishing how quickly content is indexed now. That's a revolution in architecture.
Luiz thinks in the next few years the level of interactivity, insight and background information the system will have to help you, will dwarf what there is in Instant Search. If you want to know why Google is so insistent on using Real Names in Google+, this is why.
Luiz explains this change having 4 drivers:
  • Applications - instantaneous , personalized, contextual
  • Scale - increased attention to latency tail
  • Efficiency - driving utilization up, and energy/water usage down
  • Hardware Trends - non-volatile storage, multi-cores, fast networks
Instant in the context of Warehouse computing is a massive engineering challenge. It's a hard thing to treat a datacenter as a computer and it's a hard thing to provide instant indexing and instant results, to provide instant in a warehouse scale computer is an entirely new level of challenge. This challenge is what the second half of his talk covers.
The problem is we aren't meeting this challenge. Our infrastructure is broken. Datacenters have the diameter of a microsecond, yet we are still using entire stacks designed for WANs. Real-time requires low and bounded latencies and our stacks can't provide low latency at scale. We need to fix this problem and towards this end Luiz sets out a research agenda, targeting problems that need to be solved:
  • Rethink IO software stack. An OS that makes scheduling decisions 10s of msecs is incompatible with IO devices that response in microseconds.
  • Revisit operating systems scheduling.
  • Rethink threading models.
  • Re-read 1990's fast messaging papers.
  • Make IO design a higher priority. Not just NICs and RDMA, consider CPU design and memory systems.
"The fun starts now" Luiz says, these are still very early days, predicting this will be the:
  • Decade of resource efficiency
  • Decade of IO
  • Decade of low latency (and low tail latency)
  • Decade of Warehouse-scale disaggregation, making resources available outside of just one machine, not just a single rack, but all machines.
This is a great talk, very informative, and very inspiring. Well worth watching. We'll talk more about specific technical points in later articles, but his sets the stage not just for Google, but for the rest of the industry as well.

Related Articles