Thursday, 28 October 2010

Running the Show: Configuration Management with Chef: RailsConf 2009 - O'Reilly Conferences, May 04 - 07, 2009, Las Vegas, NV!

RUNNING THE SHOW: CONFIGURATION MANAGEMENT WITH CHEF

Edd Dumbill (O'Reilly Media, Inc. )

8:30am Monday, 05/04/2009

Tutorial
Location: Pavilion 2 - 3

Presentations: Running the Show_ Configuration Management with Chef Presentation [PDF],
Running the Show_ Configuration Management with Chef Presentation 1 [ZIP]

Few completed Rails apps are architecturally simple. As soon as you grow, you find yourself using multiple subsystems and machines to scale. Cloud-based environments such as EC2 make this an attractive and cost-efficient option, but create new headaches in configuration management.

Chef is the latest development in open source systems integration, a powerful Ruby-based framework for managing servers in a way that integrates tightly with your applications and infrastructure. As developers become increasingly responsible for operations, Chef lets you manage your servers by writing code, not running commands.

In this tutorial we cover:

Your first Chef cookbook
Chef concepts such as nodes, cookbooks and nodes
Anatomy of a cookbook
Storing and versioning your cookbooks
What happens when you run Chef
Using Chef’s Web UI
Configuring per-instance data using JSON
Lightweight configuration with Chef Solo
What comes for free: managing Apache, Ubuntu, MySQL and friends
Chef for Rails apps
Setting up your Rails environment
Deploying your application: Chef vs Capistrano

Wednesday, 27 October 2010

From Zero to the Cloud with Todd Deshane and Patrick F. Wilbur � Linux, Todd, Open, LISA, Patrick, Cloud � USENIX Update

Todd Deshane: Sure. I’m a recent graduate from Clarkson University, where I got a Ph.D in Engineering Science; I’ve been a member of xen.orgcommunity for quite sometime, basically ever since it came out.
Patrick Wilbur: I’m a graduate student, Ph.D in Computer Science at Clarkson University, I’ve been involved with Xen for a couple of years, and this is related to the research that I’ve been doing at the university, and with Todd we are both co-authors of the book “Running Xen: A Hands-on Guide to the Art of Virtualization”.

MD: You mentioned your book; I’m curious how did you guys end up writing a book? Can you talk about your experience in doing that?
TD: Since we were involved with Xen in our research and published a paper right after Xen came out, the publishing company thought that we were the best ones to write this book external to the group at Cambridge that developed Xen, so they contacted our advisor Jeanna Matthews and a group of people from our lab were interested to work on this and write the book. This is how it all started.

MD:There are seven co-authors to your book; how did you collaborate with so many people working on this?
TD: It is a lot of work to write a book. We spent a lot of time working on this, and split the chapters based on the expertise each one of us had in different areas. Of course later we had to get together and read each others chapters and make it a uniform book. This was challenging as there was quite a large group of people involved.
PW: As Todd mentioned, it was definitely not easy, but it was really a good experience, working with a great group of people.

MD: You have a training class this year at LISA “Introduction to the Open Source Xen Hypervisor”; can you briefly tell us what will this cover; also how is it different from your previous trainings at LISA?
TD: Me and Patrick presented first this material at LISA’08, and this was at that time heavily based on the book, because the book itself is a hands-on guide for new users, for people that don’t have experience with Xen and we would basically take you from zero to running virtualization in your datacenter. This year we updated the content to the current times and extended it to the cloud technologies where people are using virtualization to run their public and private clouds, and many times a hybrid mode between them. Also we’ve added the Xen Cloud Platform that is also new; we’re still going from zero to virtualization, but this time we are going from zero to the cloud.
PW: Also, I would like to add that one of the neat things of this version of our class will bring in new technologies like the cloud computing, and this is relevant even for system administrators that have only a few machines (even if they don’t administer a cloud it is very valuable to understand it as consumers) but also for the enterprise administrators that use virtualization a lot even if they don’t use the term cloud, but will find platforms like Xen Cloud Platform very interesting.

MD: You guys also have a BOF: “Open Source and Open Standards-based Cloud Computing”. Can you talk about that a little?
TD: What is great about the BOF is that it will be an extension of our tutorial. This means that if you were in our session you will be in the right spot; if you haven’t, still you’re in a great spot because we are going to cover the basics for the Xen Cloud Platform again to make sure everyone is on the same page. Also we are going to have Ben Pfaff from Nicira Networks and he’s going to talk about Open vSwitch and what they are doing to take network virtualization to the next level; Jason Faulkner from Rackspace will talk about OpenStack and their efforts to create an open standards for cloud computing.

MD:You mentioned OpenStack; another similar project would be Eucalyptus. Such abstractions are built on top of the virtualization layer and hide many things to the administrator; is it still relevant to have a basic Xen understanding and knowledge?
TD: The main reason why we started teaching about Xen it was because it was difficult to setup and use, it wasn’t polished, it wasn’t quite user friendly; it was not the product it is today. Nowadays people spend less time getting it to work, and the focus is on different levels of the stack and using them, and we’ve updated our tutorial to take this into consideration. We still teach you all the things you need in case you have to debug it.

MD: We’ve seen a lot of traction related to KVM; many distros have switched to KVM instead of Xen. Why do you think that happened?
TD: The basic reason why KVM got popular quickly is because it is a simple Linux virtualization system that is integrated into the kernel itself. It relies on Linux and QEMU and it is not a stand-alone hypervisor. The reason why many Linux distributions currently have KVM support and not necessarily Xen support is that KVM is fully integrated in the mainline kernel (it comes with the Linux kernel). Xen domain0 support (Linux as the Xen management domain is in progress), but Xen (the hypervisor) is not intended to be included in Linux, but is a stand-alone hypervisor, which allows for various management domains to run on top of it (for example, Linux, Solaris, BSD). Linux distributions will (and many already are) starting to add Xen support back into the their standard distributions.

MD: You guys will be busy with a class and a BOF; I’m curious if you come to LISA just for teaching those, or if you stay around for other talks?
PW: I will be there for the whole week and will attend several different sessions. Actually I will be sticking around after LISA also for CHIMIT 2010, that will be located in the same area as this is a good opportunity to attend both conferences.
TD: Unfortunately myself, I will not be able to stay all the week, but I would have liked to been able to attend some of the many great talks, but I will not be able to do it this year.

MD: What other interesting things are you working on these days?
TD: I’m doing some interesting work these days in a consulting project, where we are taking their Xen deployment more into the cloud dynamic type of environment: hadoop clustering, Xen Cloud Platform, OpenStack, puppet are things that we are working on, setting up hybrid cloud deployments with public clouds like amazon, rackspace, and private clouds. I’m also working on a very interesting research project where we are trying to combine an organization’s mission space with cyber space. Finally, I’ve been spending more time helping the xen.org community and I will be taking on the role of Technology Evangelist for Citrix/Xen.org. We are also looking to hire a new community manager for xen.org.
PW: I’m looking into the usable security space, where human computer interaction meets security and actually I’m working with virtualization to segregate users applications into different privilege spaces and isolate various applications. Related to this I’m also hosting another BOF at the end of the week on Thursday“Human-Computer Interaction: Experiences and Difficulties in IT Management, Security, and Privacy” that should be interesting to meet and exchange ideas with different people in the usable security space.

Tuesday, 26 October 2010

Google search index splits with MapReduce • The Register

Google search index splits with MapReduce

Welds BigTable to file system 'Colossus'

By Cade Metz in San Francisco • Get more from this author

Posted in Servers, 9th September 2010 21:52 GMT

Free whitepaper – Trying to keep smartphones off your network?

Exclusive Google Caffeine — the remodeled search infrastructure rolled out across Google's worldwide data center network earlier this year — is not based on MapReduce, the distributed number-crunching platform that famously underpinned the company's previous indexing system. As the likes of Yahoo!, Facebook, and Microsoft work to duplicate MapReduce through the open source Hadoop project, Google is moving on.
According to Eisar Lipkovitz, a senior director of engineering at Google, Caffeine moves Google's back-end indexing system away from MapReduce and onto BigTable, the company's distributed database platform.

As Google's Matt Cutts told us last year, the new search infrastructure also uses an overhaul of the company's GFS distributed file system. This has been referred to as Google File System 2 or GFS2, but inside Google, Lipkovitz says, it's known as Colossus.

Caffeine expands on BigTable to create a kind of database programming model that lets the company make changes to its web index without rebuilding the entire index from scratch. "[Caffeine] is a database-driven, Big Table–variety indexing system," Lipkovitz tells The Reg, saying that Google will soon publish a paper discussing the system. The paper, he says, will be delivered next month at the USENIX Symposium on Operating Systems Design and Implementation (OSDI).
Before the arrival of Caffeine, Google built its search index — an index of the entire web — using MapReduce. In essence, the index was created by a sequence of batch operations. The MapReduce platform "maps" data-crunching tasks across a collection of distributed machines, splitting them into tiny sub-tasks, before "reducing" the results into one master calculation.
MapReduce would receive the epic amounts of webpage data collected by Google's crawlers, and it would crunch this down to the links and metadata needed to actually search these pages. Among so many other things, it would determine each site's PageRank, the site's relationship to all the other sites on the web.
"We would start with a very large [collection of data] and we would process it," Lipkovitz tells us. "Then, eight hours or so later, we'd get the output of this whole process and we'd copy it to our serving systems. And we did this continuously."
But MapReduce didn't allow Google to update its index as quickly as it would like. In the age of the "real-time" web, the company is determined to refresh its index within seconds. Over the last few years, MapReduce has received ample criticism from the likes of MIT database guru Mike Stonebraker, and according to Lipkovitz, Google long ago made "similar observations." MapReduce, he says, isn't suited to calculations that need to occur in near real-time.
MapReduce is a sequence of batch operations, and generally, Lipkovitz explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."
With Caffeine, Google can update its index by making direct changes to the web map already stored in BigTable. This includes a kind of framework that sits atop BigTable, and Lipkovitz compares it to old-school database programming and the use of "database triggers."
"It's completely incremental," he says. When a new page is crawled, Google can update its index with the necessarily changes rather than rebuilding the whole thing.
Previously, Google's index was separated into layers, and some layers updated faster than others. The main layer wouldn't be updated for about two weeks. "To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you," Google said in a blog post when Caffeine was rolled out.
"With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before — no matter when or where it was published."
There's also a "compute framework," Lipkovitz says, that lets engineers execute code atop BigTable. And the system is underpinned by Colossus, the distributed storage platform also known as GFS2. The original Google File System, he says, didn't scale as well as the company would like.
Colossus is specifically designed for BigTable, and for this reason it's not as suited to "general use" as GFS was. In other words, it was built specifically for use with the new Caffeine search-indexing system, and though it may be used in some form with other Google services, it isn't the sort of thing that's designed to span the entire Google infrastructure.
At Wednesday's launch of Google Instant, a new incarnation of Google's search engine that updates search results as you type, Google distinguished engineer Ben Gomes told The Reg that Caffeine was not built with Instant in mind, but that it helped deal with the added load placed on the system by the "streaming" search service.
Lipkovitz makes a point of saying that MapReduce is by no means dead. There are still cases where Caffeine uses batch processing, and MapReduce is still the basis for myriad other Google services. But prior the arrival of Caffeine, the indexing system was Google's largest MapReduce application, so use of the platform has been significantly, well, reduced.
In 2004, Google published research papers on GFS and MapReduce that became the basis for the open source Hadoop platform now used by Yahoo!, Facebook, and — yes — Microsoft. But as Google moves beyond GFS and MapReduce, Lipkovitz stresses that he is "not claiming that the rest of the world is behind us."
"We're in business of making searches useful," he says. "We're not in the business of selling infrastructure." ®

Dawn of a New Day « Ray Ozzie

Dawn of a New Day

To:           Executive Staff and direct reports
Date:         October 28, 2010
From:         Ray Ozzie
Subject:      Dawn of a New Day

Five years ago, having only recently arrived at the company, I wrote The Internet Services Disruption in order to kick off a major change management process across the company. In the opening section of that memo, I noted that about every five years our industry experiences what appears to be an inflection point that results in great turbulence and change.
In the wake of that memo, the last five years has been a time of great transformation for Microsoft. At this point we’re truly all in with regard to services. I’m incredibly proud of the people and the work that has been done across the company, and of the way that we’ve turned this services transformation into opportunities that will pay off for years to come.
In the realm of the service-centric ‘seamless OS’ we’re well on the path to having Windows Live serve as an optional yet natural services complement to the Windows and Office software. In the realm of ‘seamless productivity’, Office 365 and our 2010 Office, SharePoint and Live deliverables have shifted Office from being PC-centric toward now also robustly spanning the web and mobile. In ‘seamless entertainment’, Xbox Live has transformed Xbox into a real-time, social, media-rich TV experience.
And in the realm of what I referred to as our ‘services platform’, I couldn’t be more proud of what’s emerged as Windows Azure & SQL Azure. Inspired by little more than a memo, a few decks and discussions, intrapreneurial leaders stepped up to build and deliver an innovative service that, while still nascent, will over time prove to be transformational for the company and the industry.
Our products are now more relevant than ever. Bing has blossomed and its advertising, social, metadata & real-time analytics capabilities are growing to power every one of our myriad services offerings. Over the years the Windows client expanded its relevance even with the rise of low-cost netbooks. Office expanded its relevance even with a shift toward open data formats & web-based productivity. Our server assets have had greater relevance even with a marked shift toward virtualization & cloud computing.
Quite important to me, I’m also quite proud of the degree to which we’ve continued to grow and mature in the area of responsible competition, and the breadth and depth of our cultural shift toward genuine openness, interoperability and privacy which are now such key cornerstones of everything we do.
Yet, for all our great progress, some of the opportunities I laid out in my memo five years ago remain elusive and are yet to be realized.
Certain of our competitors’ products and their rapid advancement & refinement of new usage scenarios have been quite noteworthy. Our early and clear vision notwithstanding, their execution has surpassed our own in mobile experiences, in the seamless fusion of hardware & software & services, and in social networking & myriad new forms of internet-centric social interaction.
We’ve seen agile innovation playing out before a backdrop in which many dramatic changes have occurred across all aspects of our industry’s core infrastructure. These myriad evolutions of our infrastructure have been predicted for years, but in the past five years so much has happened that we’ve grown already to take many of these changes for granted: Ubiquitous internet access over wired, WiFi and 3G/4G networks; many now even take for granted that LTE and ‘whitespace’ will be broadly delivered. We’ve seen our boxy devices based on ‘system boards’ morph into sleek elegantly-designed devices based on transformational ‘systems on a chip’. We’ve seen bulky CRT monitors replaced by impossibly thin touch screens. We’ve seen business processes and entire organizations transformed by the zero-friction nature of the internet; the walls between producer and consumer having now vanished. Substantial business ecosystems have collapsed as many classic aggregation & distribution mechanisms no longer make sense.
Organizations worldwide, in every industry, are now stepping back and re-thinking the basics; questioning their most fundamental structural tenets. Doing so is necessary for their long-term growth and survival. And our own industry is no exception, where we must question our most fundamental assumptions about infrastructure & apps.
The past five years have been breathtaking. But the next five years will bring about yet another inflection point – a transformation that will once again yield unprecedented opportunities for our company and our industry catalyzed by the huge & inevitable shift in apps & infrastructure that’s truly now just begun.
Imagining A “Post-PC” World
One particular day next month, November 20^th 2010, represents a significant milestone. Those of us in the PC industry who placed an early bet on a then-nascent PC graphical UI will toast that day as being the 25^th anniversary of the launch of Windows 1.0.
Our journey began in support of audacious concepts that were originally just imagined and dreamed: A computer that’s ‘personal’. Or, a PC on every desktop and in every home, running Microsoft software.
Windows may not have been the first graphical UI on a personal computer, but over time the product unquestionably democratized computing & communications for more than a billion people worldwide. Windows and Office truly grew to define the PC; establishing the core concepts and usage scenarios that for so many of us, over time, have become etched in stone.
For the most part, we’ve grown to perceive of ‘computing’ as being equated with specific familiar ‘artifacts’ such as the ‘computer’, the ‘program’ that’s installed on a computer, and the ‘files’ that are stored on that computer’s ‘desktop’. For the majority of users, the PC is largely indistinguishable even from the ‘browser’ or ‘internet’.
As such, it’s difficult for many of us to even imagine that this could ever change.
But as the PC client and PC-based server have grown from their simple roots over the past 25 years, the PC-centric / server-centric model has accreted simply immense complexity. This is a direct by-product of the PC’s success: how broad and diverse the PC’s ecosystem has become; how complex it’s become to manage the acquisition & lifecycle of our hardware, software, and data artifacts. It’s undeniable that some form of this complexity is readily apparent to most all our customers: your neighbors; any small business owner; the ‘tech’ head of household; enterprise IT.
Success begets product requirements. And even when superhuman engineering and design talent is applied, there are limits to how much you can apply beautiful veneers before inherent complexity is destined to bleed through.
Complexity kills. Complexity sucks the life out of users, developers and IT. Complexity makes products difficult to plan, build, test and use. Complexity introduces security challenges. Complexity causes administrator frustration.
And as time goes on and as software products mature – even with the best of intent – complexity is inescapable.
Indeed, many have pointed out that there’s a flip side to complexity: in our industry, complexity of a successful product also tends to provide some assurance of its longevity. Complex interdependencies and any product’s inherent ‘quirks’ will virtually guarantee that broadly adopted systems won’t simply vanish overnight. And so long as a system is well-supported and continues to provide unique and material value to a customer, even many of the most complex and broadly maligned assets will hold their ground. And why not? They’re valuable. They work.
But so long as customer or competitive requirements drive teams to build layers of new function on top of a complex core, ultimately a limit will be reached. Fragility can grow to constrain agility. Some deep architectural strengths can become irrelevant – or worse, can become hindrances.
Our PC software has driven the creation of an amazing ecosystem, and is incredibly valuable to a world of customers and partners. And the PC and its ecosystem is going to keep growing, and growing, for a long time to come. But today, as I wrote five years ago, ”Just as in the past, we must reflect upon what’s going on around us, and reflect upon our strengths, weaknesses and industry leadership responsibilities, and respond. As much as ever, it’s clear that if we fail to do so, our business as we know it is at risk.”
And so at this juncture, given all that has transpired in computing and communications, it’s important that all of us do precisely what our competitors and customers will ultimately do: close our eyes and form a realistic picture of what a post-PC world might actually look like, if it were to ever truly occur. How would customers accomplish the kinds of things they do today? In what ways would it be better? In what ways would it be worse, or just different?
Those who can envision a plausible future that’s brighter than today will earn the opportunity to lead.
In our industry, if you can imagine something, you can build it. We at Microsoft know from our common past – even the past five years – that if we know what needs to be done, and if we act decisively, any challenge can be transformed into a significant opportunity. And so, the first step for each of us is to imagine fearlessly; to dream.
Continuous Services | Connected Devices
What’s happened in every aspect of computing & communications over the course of the past five years has given us much to dream about. Certainly the ‘net-connected PC, and PC-based servers, have driven the creation of an incredible industry and have laid the groundwork for mass-market understanding of so much of what’s possible with ‘computers’. But slowly but surely, our lives, businesses and society are in the process of a wholesale reconfiguration in the way we perceive and apply technology.
As we’ve begun to embrace today’s incredibly powerful app-capable phones and pads into our daily lives, and as we’ve embraced myriad innovative services & websites, the early adopters among us have decidedly begun to move away from mentally associating our computing activities with the hardware/software artifacts of our past such as PC’s, CD-installed programs, desktops, folders & files.
Instead, to cope with the inherent complexity of a world of devices, a world of websites, and a world of apps & personal data that is spread across myriad devices & websites, a simple conceptual model is taking shape that brings it all together. We’re moving toward a world of 1) cloud-based continuous services that connect us all and do our bidding, and 2) appliance-like connected devices enabling us to interact with those cloud-based services.
Continuous services are websites and cloud-based agents that we can rely on for more and more of what we do. On the back end, they possess attributes enabled by our newfound world of cloud computing: They’re always-available and are capable of unbounded scale. They’re constantly assimilating & analyzing data from both our real and online worlds. They’re constantly being refined & improved based on what works, and what doesn’t. By bringing us all together in new ways, they constantly reshape the social fabric underlying our society, organizations and lives. From news & entertainment, to transportation, to commerce, to customer service, we and our businesses and governments are being transformed by this new world of services that we rely on to operate flawlessly, 7×24, behind the scenes.
Our personal and corporate data now sits within these services – and as a result we’re more and more concerned with issues of trust & privacy. We most commonly engage and interact with these internet-based sites & services through the browser. But increasingly, we also interact with these continuous services through apps that are loaded onto a broad variety of service-connected devices – on our desks, or in our pockets & pocketbooks.
Connected devices beyond the PC will increasingly come in a breathtaking number of shapes and sizes, tuned for a broad variety of communications, creation & consumption tasks. Each individual will interact with a fairly good number of these connected devices on a daily basis – their phone / internet companion; their car; a shared public display in the conference room, living room, or hallway wall. Indeed some of these connected devices may even grow to bear a resemblance to today’s desktop PC or clamshell laptop. But there’s one key difference in tomorrow’s devices: they’re relatively simple and fundamentally appliance-like by design, from birth. They’re instantly usable, interchangeable, and trivially replaceable without loss. But being appliance-like doesn’t mean that they’re not also quite capable in terms of storage; rather, it just means that storage has shifted to being more cloud-centric than device-centric. A world of content – both personal and published – is streamed, cached or synchronized with a world of cloud-based continuous services.
Moving forward, these ‘connected devices’ will also frequently take the form of embedded devices of varying purpose including telemetry & control. Our world increasingly will be filled with these devices – from the remotely diagnosed elevator, to the sensors on our highways and throughout our environment. These embedded devices will share a key attribute with non-embedded UI-centric devices: they’re appliance-like, easily configured, interchangeable and replaceable without loss.
At first blush, this world of continuous services and connected devices doesn’t seem very different than today. But those who build, deploy and manage today’s websites understand viscerally that fielding a truly continuous service is incredibly difficult and is only achieved by the most sophisticated high-scale consumer websites. And those who build and deploy application fabrics targeting connected devices understand how challenging it can be to simply & reliably just ‘sync’ or ‘stream’. To achieve these seemingly simple objectives will require dramatic innovation in human interface, hardware, software and services.
How It Might Happen
From the perspective of living so deeply within the world of the device-centric software & hardware that we’ve collectively created over the past 25 years, it’s understandably difficult to imagine how a dramatic, wholesale shift toward this new continuous services + connected devices model would ever plausibly gain traction relative to what’s so broadly in use today. But in the technology world, these industry-scoped transformations have indeed happened before. Complexity accrues; dramatically new and improved capabilities arise.
Many years ago when the PC first emerged as an alternative to the mini and mainframe, the key facets of simplicity and broad approachability were key to its amazing success. If there’s to be a next wave of industry reconfiguration – toward a world of internet-connected continuous services and appliance-like connected devices – it would likely arise again from those very same facets.
It may take quite a while to happen, but I believe that in some form or another, without doubt, it will.
For each of us who can clearly envision the end-game, the opportunity is to recognize both the inevitability and value inherent in the big shift ahead, and to do what it takes to lead our customers into this new world.
In the short term, this means imagining the ‘killer apps & services’ and ‘killer devices’ that match up to a broad range of customer needs as they’ll evolve in this new era. Whether in the realm of communications, productivity, entertainment or business, tomorrow’s experiences & solutions are likely to differ significantly even from today’s most successful apps. Tomorrow’s experiences will be inherently transmedia & trans-device. They’ll be centered on your own social & organizational networks. For both individuals and businesses, new consumption & interaction models will change the game. It’s inevitable.
To deliver what seems to be required – e.g. an amazing level of coherence across apps, services and devices – will require innovation in user experience, interaction model, authentication model, user data & privacy model, policy & management model, programming & application model, and so on. These platform innovations will happen in small, progressive steps, providing significant opportunity to lead. In adapting our strategies, tactics, plans & processes to deliver what’s required by this new world, the opportunity is simply huge.
The one irrefutable truth is that in any large organization, any transformation that is to ‘stick’ must emerge from within. Those on the outside can strongly influence, particularly with their wallets. Those above are responsible for developing and articulating a compelling vision, eliminating obstacles, prioritizing resources, and generally setting the stage with a principled approach.
But the power and responsibility to truly effect transformation exists in no small part at the edge. Within those who, led or inspired, feel personally and collectively motivated to make; to act; to do.
In taking the time to read this, most likely it’s you.
Realizing a Dream
In 1939, in New York City, there was an amazing World’s Fair. It was called ‘the greatest show of all time’.
In that year Americans were exhausted, having lived through a decade of depression. Unemployment still hovered above 17%. In Europe, the next world war was brewing. It was an undeniably dark juncture for us all.
And yet, this 1939 World’s Fair opened in a way that evoked broad and acute hope: the promise of a glorious future. There were pavilions from industry & countries all across the world showing vision; showing progress: The Futurama; The World of Tomorrow. Icons conjuring up images of the future: The Trylon; The Perisphere.
The fair’s theme: Dawn of a New Day.
Surrounding the event, stories were written and vividly told to help everyone envision and dream of a future of modern conveniences; superhighways & spacious suburbs; technological wonders to alleviate hardship and improve everyday life.
The fair’s exhibits and stories laid a broad-based imprint across society of what needed to be done. To plausibly leap from such a dark time to such a potentially wonderful future meant having an attitude, individually and collectively, that we could achieve whatever we set our minds to. That anything was possible.
In the following years – fueled both by what was necessary for survival and by our hope for the future – manufacturing jumped 50%. Technological breakthroughs abounded. What had been so hopefully and optimistically imagined by many, was achieved by all.
And, as their children, now we’re living their dreams.
Today, in my own dreams, I see a great, expansive future for our industry and for our company – a future of amazing, pervasive cloud-centric experiences delivered through a world of innovative devices that surround us.
Without a doubt, as in 1939 there are conditions in our society today that breed uncertainty: jobs, housing, health, education, security, the environment. And yes, there are also challenging conditions for our company: it’s a tough, fast-moving, and highly competitive environment.
And yet, even in the presence of so much uncertainty, I feel an acute sense of hope and optimism.
When I look forward, I can’t help but see the potential for a much brighter future: Even beyond the first billion, so many more people using technology to improve their lives, businesses and societies, in so many ways. New apps, services & scenarios in communications, collaboration & productivity, commerce, education, health care, emergency management, human services, transportation, the environment, security – the list goes on, and on, and on.
We’ve got so far to go before we even scratch the surface of what’s now possible. All these new services will be cloud-centric ‘continuous services’ built in a way that we can all rely upon. As such, cloud computing will become pervasive for developers and IT – a shift that’ll catalyze the transformation of infrastructure, systems & business processes across all major organizations worldwide. And all these new services will work hand-in-hand with an unimaginably fascinating world of devices-to-come. Today’s PC’s, phones & pads are just the very beginning; we’ll see decades to come of incredible innovation from which will emerge all sorts of ‘connected companions’ that we’ll wear, we’ll carry, we’ll use on our desks & walls and the environment all around us. Service-connected devices going far beyond just the ‘screen, keyboard and mouse’: humanly-natural ‘conscious’ devices that’ll see, recognize, hear & listen to you and what’s around you, that’ll feel your touch and gestures and movement, that’ll detect your proximity to others; that’ll sense your location, direction, altitude, temperature, heartbeat & health.
Let there be no doubt that the big shifts occurring over the next five years ensure that this will absolutely be a time of great opportunity for those who put past technologies & successes into perspective, and envision all the transformational value that can be offered moving forward to individuals, businesses, governments and society. It’s the dawn of a new day – the sun having now arisen on a world of continuous services and connected devices.
And so, as Microsoft has done so successfully over the course of the company’s history, let’s mark this five-year milestone by once again fearlessly embracing that which is technologically inevitable – clearing a path to the extraordinary opportunity that lies ahead for us, for the industry, and for our customers.
Ray

Saturday, 23 October 2010

Writing Parsers in Ruby using Treetop

Writing Parsers in Ruby using Treetop: "

Treetop is one of the most underrated, yet powerful, Ruby libraries out there. If you want to write a parser, it kicks ass. The only problem is unless you're into reading up about and playing with parsers, it's not always obvious how to get going with them, or Treetop in particular. Luckily Aaron Gough, Toronto-based Ruby developer, comes to our rescue with some great blog posts.

Aaron, who has a passion for messing around with parsers and language implementations, recently released Koi - a pure Ruby implementation of a language parser, compiler, and virtual machine. If you're ready to dive in at the deep end, the code for Koi makes for good reading.

Starting more simply, though, is Aaron's latest blog post: A quick intro to writing a parser with Treetop. In the post, he covers building a 'parsing expression grammar' (PEG) for a basic Lisp-like language from start to finish - from installing the gem, through to building up a desired set of results. It's a great walkthrough and unless you're already au fait with parsers, you'll pick something up.

If thinking of 'grammars' and Treetop is enough to make your ears itch, though, check out Aaron's sister article: Writing an S-Expression parser in Ruby. On the surface, this sounds like the same thing as the other one, except that this is written in pure Ruby with no Treetop involvement. But while pure Ruby is always nice to see, it's a stark reminder of how much a library like Treetop offers us.

[ad] Check out Rails Cloud Hosting from Joyent. You can get started from a mere 83 cents per day and you get free bandwidth and persistent local storage with a 100% SLA.

The CAP Theorem... Again

The CAP Theorem... Again: "
Today looks to be (again) the day of the CAP theorem^[1]^[2], so let’s do a quick summary:

We had Coda Hale’s ☞ You can’t sacrifice partition tolerance:

Of the CAP theorem’s Consistency, Availability, and Partition Tolerance, Partition Tolerance is mandatory in distributed systems. You cannot not choose it. Instead of CAP, you should think about your availability in terms of yield (percent of requests answered successfully) and harvest (percent of required data actually included in the responses) and which of these two your system will sacrifice when failures happen.
Jeff Darcy followed up with ☞ Another CAP article:

It seems to me that there is a consensus emerging. Even if Gilbert and Lynch only formally proved a narrower version of Brewer’s original conjecture, that conjecture and the tradeoffs it implies are still alive and well and highly relevant to the design of real working systems that serve real business needs.

and ☞ Reactions to Coda’s CAP post:

The last point is whether CAP really boils down to “two out of three” or not. Of course not, even though I’ve probably said that myself a couple of times. The reason is merely pedagogical. It’s a pretty good approximation, much like teaching Newtonian physics or ideal gases in chemistry. You have to get people to understand the basic shape of things before you start talking about the exceptions and special cases, and “two out of three” is a good approximation. Sure, you can trade off just a little of one for a little of another instead of purely either/or, but only after you thoroughly understand and appreciate why the simpler form doesn’t suffice. The last thing we need is people with learner’s permits trying to build exotic race cars. They just give the doubters and trolls more ammunition with which to suppress innovation.
Henry Robinson’s ☞ CAP Confusion: Problems with ‘partition tolerance’ popped up too:

Not ‘choosing’ P is analogous to building a network that will never experience multiple correlated failures. This is unreasonable for a distributed system – precisely for all the valid reasons that are laid out in the CACM post about correlated failures, OS bugs and cluster disasters – so what a designer has to do is to decide between maintaining consistency and availability. Dr. Stonebraker tells us to choose consistency, in fact, because availability will unavoidably be impacted by large failure incidents. This is a legitimate design choice, and one that the traditional RDBMS lineage of systems has explored to its fullest, but it implicitly protects us neither from availability problems stemming from smaller failure incidents, nor from the high cost of maintaining sequential consistency.
Many of the above articles were referring to Michael Stonebraker’s ☞ Errors in Database Systems, Eventual Consistency, and the CAP Theorem:

In summary, one should not throw out the C so quickly, since there are real error scenarios where CAP does not apply and it seems like a bad tradeoff in many of the other situations.

So, we pretty much went full circle. I just hope that Eric Brewer will do ☞ follow up:

I really need to write an updated CAP theorem paper

Michael Stonebraker’s clarifications on the CAP theorem and the older but related Daniel Abadi’s Problems with CAP
(↩)
Nati Shalom’s ☞ NoCAP and my own NoCAP… is wrong (nb make sure you also read the comments)
(↩)

Original title and link: The CAP Theorem… Again (NoSQL databases © myNoSQL)

What is Network-based Application Virtualization and Why Do You Need It?

What is Network-based Application Virtualization and Why Do You Need It?: "
With all the attention being paid these days to VDI (virtual desktop infrastructure) and application virtualization and server virtualization and <insert type> virtualization it’s easy to forget about network-based application virtualization. But it’s the one virtualization technique you shouldn’t forget because it is a foundational technology upon which myriad other solutions will be enabled.

WHAT IS NETWORK-BASED APPLICATION VIRTUALIZATION?

This term may not be familiar to you but that’s because since its inception oh, more than a decade ago, it’s always just been called “server virtualization”. After the turn of the century (I love saying that, by the way) it was always referred to as service virtualization in SOA and XML circles. With the rise of the likes of VMware and Citrix and Microsoft server virtualization solutions, it’s become impossible to just use the term “server virtualization” and “service virtualization” is just as ambiguous so it seems appropriate to give it a few more modifiers to make it clear that we’re talking about the network-based virtualization (aggregation) of applications.
"

Thursday, 21 October 2010

RESTful Cassandra

RESTful Cassandra: "RESTful Cassandra:
Gary Dusbabek:

A lot of people, when first learning about Cassandra, wonder why there isn’t any easier (say, RESTful) way to perform operations. I did. It didn’t take someone very long to point out that it mainly has to do with performance. Cassandra spends a significant amount of resources marshaling data and Thrift currently does that very efficiently.

[…]

I eventually arrived at the decision that an HTTP Cassandra Daemon/Server pair similar to the existing Avro and Thrift versions would do the trick. It would basically be a straight port, with a few minor caveats. One big thing is that HTTP uses a new connection for each request, so storing Cassandra session information in threadlocals is gone out the window. This means that authentication needs to be abandoned, done with every request, or we need to use HTTP sessions. Punt.

I sound like a broken record: better protocols always help with adoption

Original title and link: RESTful Cassandra (NoSQL databases © myNoSQL)

Tuesday, 19 October 2010

Objectively speaking: the future of objects

Objectively speaking: the future of objects: "
One infrastructure to rule them all discussed the emerging enterprise need for a single, scalable file storage infrastructure. But what infrastructure?

Some background to this is last year’s Cloud Quadrant and this year’s Why private clouds are part of the future.

Block and file

For decades direct-attached block-based storage was the only option. The ’80s introduced file-based storage. Much of storage systems growth in the last 15 years has been in file servers.

New systems, be they video, sensor or social, are producing massive collections of files at an accelerating rate. The rapid development of lower cost mobile computing devices – smartphones, iPad’s, netbooks and Android tablets – mean that content consumption and production will be a major source of file growth. The long tail of content demand means that the variety of online content will grow – especially as the cost of storage declines.

Private cloud

The larger issue is the need to keep this fast-growing information online for years, despite rapid change in the underlying storage, network and computing infrastructures. File data must become independent of our storage and server choices.

As stores grow data migration becomes less feasible. Rip ‘n replace gives way to in-place upgrades.

Achieving that means moving to an object storage paradigm. How do we know this will happen? Because it already has.

Object stores at Google and Amazon Web Services are already among the largest storage infrastructures in the world. AWS alone stores over 100 billion objects today. Hundreds of millions of people use object storage every day – and don’t even know it.

What is object storage?

Object storage instantiations vary in detail and supported features. However, all object storage has two key characteristics:

–Individual objects are accessed by a global handle. The handle may, for example, be a hash, a key or a something like a URL.

–Extended metadata. The extended metadata content goes beyond that of traditional file systems and may include additional security and content validation as well as presentation, decompression or other information relating to the content, production or value of the enclosed file.

Like files, objects contain data. But they lack key features that would make them files. They don’t have:

-Hierarchy. Not only are all objects created equal, they all remain at the same level. You can’t put one object inside another.

-Names. At least, not human-type names like Claudia_Schiffer or 2006_Taxes.

A user-facing component provides those missing elements. You decide which files belong in which folders. You give the files names. You decide which users have access to which files and what those users can do with those files.

Those choices are embedded in the object metadata so they can be presented as you have organized them. But if you have the object’s handle you can access it directly.

All objects look alike. Some are bigger and some are smaller, but until we get them dressed and named, they aren’t files. Yet they are a lot closer to files than blocks are. Which means that if you choose to manage objects you no longer have to worry about blocks.

Essentially then, objects are files with an address – instead of a pathname – and extra metadata. Unlike distributed file systems – where the metadata is stored in a metadata server. The metadata server keeps track the location of the data on the storage servers.

Some file storage systems are built on object storage repositories. Legacy APIs make it a requirement for many applications, but URL-style access through HTTP is more flexible in the long run.

Crossing the implementation chasm

While the economics of objects are obvious at scale, they are less compelling at the beginning of a typical enterprise project. It is easier to buy another file server than to worry about long-term architecture.

Here’s a rough diagram of the relative scalability of storage options:

When under-12-month paybacks are expected, who will buy an object storage infrastructure? The simple answer is that as object stores become better known and startup costs are reduced, more companies will buy them. Archives will be the first market. The longer answer is that as public cloud projects are brought inside, object stores will receive them.

The StorageMojo take

As organizations amass large file collections, the economies of scale and management for object storage will become apparent. Savvy architects will add commodity-based scale-out object storage to their tool kit.

HDS, NetApp and HP have recently added modern object stores to their product lines. And rumor has it EMC will too, either by getting Atmos to work or by buying Isilon.

Courteous comments welcome, of course. Still don’t like the name object, but I’ll get over it.

Copyright © 2010 StorageMojo. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact legal@storagemojo.com so we can take legal action immediately.
Plugin by Taragana

Related posts:

Calling all grad students The friendly folks at Scality have put up $100,000 to...

Related posts brought to you by Yet Another Related Posts Plugin.
"

Thursday, 14 October 2010

Everyday GIT With 20 Commands Or So

[Individual Developer (Standalone)] commands are essential for anybody who makes a commit, even for somebody who works alone.

If you work with other people, you will need commands listed in the [Individual Developer (Participant)] section as well.

People who play the [Integrator] role need to learn some more commands in addition to the above.

[Repository Administration] commands are for system administrators who are responsible for the care and feeding of git repositories.

Individual Developer (Standalone)

A standalone individual developer does not exchange patches with other people, and works alone in a single repository, using the following commands.

git-init(1) to create a new repository.
git-show-branch(1) to see where you are.
git-log(1) to see what happened.
git-checkout(1) and git-branch(1) to switch branches.
git-add(1) to manage the index file.
git-diff(1) and git-status(1) to see what you are in the middle of doing.
git-commit(1) to advance the current branch.
git-reset(1) and git-checkout(1) (with pathname parameters) to undo changes.
git-merge(1) to merge between local branches.
git-rebase(1) to maintain topic branches.
git-tag(1) to mark known point.

Examples

Use a tarball as a starting point for a new repository.

$ tar zxf frotz.tar.gz $ cd frotz $ git init $ git add . (1) $ git commit -m "import of frotz source tree." $ git tag v2.43 (2)

add everything under the current directory.
make a lightweight, unannotated tag.

Create a topic branch and develop.

$ git checkout -b alsa-audio (1) $ edit/compile/test $ git checkout -- curses/ux_audio_oss.c (2) $ git add curses/ux_audio_alsa.c (3) $ edit/compile/test $ git diff HEAD (4) $ git commit -a -s (5) $ edit/compile/test $ git reset --soft HEAD^ (6) $ edit/compile/test $ git diff ORIG_HEAD (7) $ git commit -a -c ORIG_HEAD (8) $ git checkout master (9) $ git merge alsa-audio (10) $ git log --since='3 days ago' (11) $ git log v2.43.. curses/ (12)

create a new topic branch.
revert your botched changes in curses/ux_audio_oss.c.
you need to tell git if you added a new file; removal and modification will be caught if you do git commit -a later.
to see what changes you are committing.
commit everything as you have tested, with your sign-off.
take the last commit back, keeping what is in the working tree.
look at the changes since the premature commit we took back.
redo the commit undone in the previous step, using the message you originally wrote.
switch to the master branch.
merge a topic branch into your master branch.
review commit logs; other forms to limit output can be combined and include --max-count=10 (show 10 commits), --until=2005-12-10, etc.
view only the changes that touch what's in curses/ directory, since v2.43 tag.

Individual Developer (Participant)

A developer working as a participant in a group project needs to learn how to communicate with others, and uses these commands in addition to the ones needed by a standalone developer.

git-clone(1) from the upstream to prime your local repository.
git-pull(1) and git-fetch(1) from "origin" to keep up-to-date with the upstream.
git-push(1) to shared repository, if you adopt CVS style shared repository workflow.
git-format-patch(1) to prepare e-mail submission, if you adopt Linux kernel-style public forum workflow.

Examples

Clone the upstream and work on it. Feed changes to upstream.

$ git clone git://git.kernel.org/pub/scm/.../torvalds/linux-2.6 my2.6 $ cd my2.6 $ edit/compile/test; git commit -a -s (1) $ git format-patch origin (2) $ git pull (3) $ git log -p ORIG_HEAD.. arch/i386 include/asm-i386 (4) $ git pull git://git.kernel.org/pub/.../jgarzik/libata-dev.git ALL (5) $ git reset --hard ORIG_HEAD (6) $ git gc (7) $ git fetch --tags (8)

repeat as needed.
extract patches from your branch for e-mail submission.
git pull fetches from origin by default and merges into the current branch.
immediately after pulling, look at the changes done upstream since last time we checked, only in the area we are interested in.
fetch from a specific branch from a specific repository and merge.
revert the pull.
garbage collect leftover objects from reverted pull.
from time to time, obtain official tags from the origin and store them under .git/refs/tags/.

Push into another repository.

satellite$ git clone mothership:frotz frotz (1) satellite$ cd frotz satellite$ git config --get-regexp '^(remote|branch)\.' (2) remote.origin.url mothership:frotz remote.origin.fetch refs/heads/*:refs/remotes/origin/* branch.master.remote origin branch.master.merge refs/heads/master satellite$ git config remote.origin.push \            master:refs/remotes/satellite/master (3) satellite$ edit/compile/test/commit satellite$ git push origin (4)  mothership$ cd frotz mothership$ git checkout master mothership$ git merge satellite/master (5)

mothership machine has a frotz repository under your home directory; clone from it to start a repository on the satellite machine.
clone sets these configuration variables by default. It arranges git pull to fetch and store the branches of mothership machine to local remotes/origin/* tracking branches.
arrange git push to push local master branch to remotes/satellite/master branch of the mothership machine.
push will stash our work away on remotes/satellite/master tracking branch on the mothership machine. You could use this as a back-up method.
on mothership machine, merge the work done on the satellite machine into the master branch.

Branch off of a specific tag.

$ git checkout -b private2.6.14 v2.6.14 (1) $ edit/compile/test; git commit -a $ git checkout master $ git format-patch -k -m --stdout v2.6.14..private2.6.14 |   git am -3 -k (2)

create a private branch based on a well known (but somewhat behind) tag.
forward port all changes in private2.6.14 branch to master branch without a formal "merging".

Integrator

A fairly central person acting as the integrator in a group project receives changes made by others, reviews and integrates them and publishes the result for others to use, using these commands in addition to the ones needed by participants.

git-am(1) to apply patches e-mailed in from your contributors.
git-pull(1) to merge from your trusted lieutenants.
git-format-patch(1) to prepare and send suggested alternative to contributors.
git-revert(1) to undo botched commits.
git-push(1) to publish the bleeding edge.

Examples

My typical GIT day.

$ git status (1) $ git show-branch (2) $ mailx (3) & s 2 3 4 5 ./+to-apply & s 7 8 ./+hold-linus & q $ git checkout -b topic/one master $ git am -3 -i -s -u ./+to-apply (4) $ compile/test $ git checkout -b hold/linus && git am -3 -i -s -u ./+hold-linus (5) $ git checkout topic/one && git rebase master (6) $ git checkout pu && git reset --hard next (7) $ git merge topic/one topic/two && git merge hold/linus (8) $ git checkout maint $ git cherry-pick master~4 (9) $ compile/test $ git tag -s -m "GIT 0.99.9x" v0.99.9x (10) $ git fetch ko && git show-branch master maint 'tags/ko-*' (11) $ git push ko (12) $ git push ko v0.99.9x (13)

see what I was in the middle of doing, if any.
see what topic branches I have and think about how ready they are.
read mails, save ones that are applicable, and save others that are not quite ready.
apply them, interactively, with my sign-offs.
create topic branch as needed and apply, again with my sign-offs.
rebase internal topic branch that has not been merged to the master, nor exposed as a part of a stable branch.
restart pu every time from the next.
and bundle topic branches still cooking.
backport a critical fix.
create a signed tag.
make sure I did not accidentally rewind master beyond what I already pushed out. ko shorthand points at the repository I have at kernel.org, and looks like this:
```
$ cat .git/remotes/ko URL: kernel.org:/pub/scm/git/git.git Pull: master:refs/tags/ko-master Pull: next:refs/tags/ko-next Pull: maint:refs/tags/ko-maint Push: master Push: next Push: +pu Push: maint
```
In the output from git show-branch, master should have everything ko-master has, and next should have everything ko-next has.
push out the bleeding edge.
push the tag out, too.

Repository Administration

A repository administrator uses the following tools to set up and maintain access to the repository by developers.

git-daemon(1) to allow anonymous download from repository.
git-shell(1) can be used as a restricted login shell for shared central repository users.

update hook howto has a good example of managing a shared central repository.

Examples

We assume the following in /etc/services

$ grep 9418 /etc/services git             9418/tcp                # Git Version Control System

Run git-daemon to serve /pub/scm from inetd.

$ grep git /etc/inetd.conf git     stream  tcp     nowait  nobody \   /usr/bin/git-daemon git-daemon --inetd --export-all /pub/scm

The actual configuration line should be on one line.

Run git-daemon to serve /pub/scm from xinetd.

$ cat /etc/xinetd.d/git-daemon # default: off # description: The git server offers access to git repositories service git {         disable = no         type            = UNLISTED         port            = 9418         socket_type     = stream         wait            = no         user            = nobody         server          = /usr/bin/git-daemon         server_args     = --inetd --export-all --base-path=/pub/scm         log_on_failure  += USERID }

Check your xinetd(8) documentation and setup, this is from a Fedora system. Others might be different.

Give push/pull only access to developers.

$ grep git /etc/passwd (1) alice:x:1000:1000::/home/alice:/usr/bin/git-shell bob:x:1001:1001::/home/bob:/usr/bin/git-shell cindy:x:1002:1002::/home/cindy:/usr/bin/git-shell david:x:1003:1003::/home/david:/usr/bin/git-shell $ grep git /etc/shells (2) /usr/bin/git-shell

log-in shell is set to /usr/bin/git-shell, which does not allow anything but git push and git pull. The users should get an ssh access to the machine.
in many distributions /etc/shells needs to list what is used as the login shell.

CVS-style shared repository.

$ grep git /etc/group (1) git:x:9418:alice,bob,cindy,david $ cd /home/devo.git $ ls -l (2)   lrwxrwxrwx   1 david git    17 Dec  4 22:40 HEAD -> refs/heads/master   drwxrwsr-x   2 david git  4096 Dec  4 22:40 branches   -rw-rw-r--   1 david git    84 Dec  4 22:40 config   -rw-rw-r--   1 david git    58 Dec  4 22:40 description   drwxrwsr-x   2 david git  4096 Dec  4 22:40 hooks   -rw-rw-r--   1 david git 37504 Dec  4 22:40 index   drwxrwsr-x   2 david git  4096 Dec  4 22:40 info   drwxrwsr-x   4 david git  4096 Dec  4 22:40 objects   drwxrwsr-x   4 david git  4096 Nov  7 14:58 refs   drwxrwsr-x   2 david git  4096 Dec  4 22:40 remotes $ ls -l hooks/update (3)   -r-xr-xr-x   1 david git  3536 Dec  4 22:40 update $ cat info/allowed-users (4) refs/heads/master       alice\|cindy refs/heads/doc-update   bob refs/tags/v[0-9]*       david

place the developers into the same git group.
and make the shared repository writable by the group.
use update-hook example by Carl from Documentation/howto/ for branch policy control.
alice and cindy can push into master, only bob can push into doc-update. david is the release manager and is the only person who can create and push version tags.

HTTP server to support dumb protocol transfer.

dev$ git update-server-info (1) dev$ ftp user@isp.example.com (2) ftp> cp -r .git /home/user/myproject.git

make sure your info/refs and objects/info/packs are up-to-date
upload to public HTTP server hosted by your ISP.

Unlimited-Data. moved to lab.itbee.vn

Thursday, 28 October 2010