Friday 1 October 2010

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems

Facebook and Site Failures Caused by Complex, Weakly Interacting, Layered Systems: "


Facebook has been so reliable that when a site outage does occur it's a definite learning opportunity. Fortunately for us we can learn something because in More Details on Today's Outage, Facebook's Robert Johnson gave a pretty candid explanation of what caused a rare 2.5 hour period of down time for Facebook. It wasn't a simple problem. The root causes were feedback loops and transient spikes caused ultimately by the complexity of weakly interacting layers in modern systems. You know, the kind everyone is building these days. Problems like this are notoriously hard to fix and finding a real solution may send Facebook back to the whiteboard. There's a technical debt that must be paid.

The outline and my interpretation (reading between the lines) of what happened is:

"

No comments:

Post a Comment