Unlimited-Data. moved to lab.itbee.vn : AWS

Showing posts with label AWS. Show all posts

Tuesday, 24 May 2011

Updated AWS Security White Paper; New Risk and Compliance White Paper

Updated AWS Security White Paper; New Risk and Compliance White Paper: "

We have updated the AWS Security White Paper and we've created a new Risk and Compliance White Paper. Both are available now.

The AWS Security White Paper describes our physical and operational security principles and practices.

It includes a description of the shared responsibility model, a summary of our control environment, a review of secure design principles, and detailed information about the security and backup considerations related to each part of AWS including the Virtual Private Cloud, EC2, and the Simple Storage Service.

The new AWS Risk and Compliance White Paper covers a number of important topics including (again) the shared responsibility model, additional information about our control environment and how to evaluate it, and detailed information about our certifications and third-party attestations. A section on key compliance issues addresses a number of topics that we are asked about on a regular basis.

The AWS Security team and the AWS Compliance team are complimentary organizations and are responsible for the security infrastructure, practices, and compliance programs described in these white papers. The AWS Security team is headed by our Chief Information Security Officer and is based outside of Washington, DC. Like most parts of AWS, this team is growing and they have a number of open positions:

We also have a number of security-related positions open in Seattle:

-- Jeff;

Monday, 2 May 2011

The Updated Big List of Articles on the Amazon Outage

The Updated Big List of Articles on the Amazon Outage: "

Source: Highscalability.com

Since The Big List Of Articles On The Amazon Outage was published we've a had few updates that people might not have seen. Amazon of course released their Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. Netlix shared their Lessons Learned from the AWS Outage as did Heroku (How Heroku Survived the Amazon Outage), Smug Mug (How SmugMug survived the Amazonpocalypse), and SimpleGeo (How SimpleGeo Stayed Up During the AWS Downtime).

The curious thing from my perspective is the general lack of response to Amazon's explanation. I expected more discussion. There's been almost none that I've seen. My guess is very few people understand what Amazon was talking about enough to comment whereas almost everyone feels qualified to talk about the event itself.

Lesson for crisis handlers: deep dive post-mortems that are timely, long, honestish, and highly technical are the most effective means of staunching the downward spiral of media attention.

Amazon's Explanation of What Happened

Amazon's Explanation Of What Happened

Experiences From Specific Companies, Both Good And Bad

Lessons Netflix Learned from the AWS Outage by several Netflixians on the Netflix Tech Blog
How Heroku Survived the Amazon Outage on the Heroku status page
How SimpleGeo Stayed Up During the AWS Downtime by Mike Malone
How SmugMug survived the Amazonpocalypse by Don MacAskill (Hacker News discussion)
How Bizo survived the Great AWS Outage of 2011 relatively unscathed... by Someone at Bizo
Joe Stump's explanation of how SimpleGeo survived
How Netflix Survived the Outage
Why Twilio Wasn’t Affected by Today’s AWS Issues on Twilio Engineering's Blog (Hacker News thread)
On reddit's outage
What caused the Quora problems/outage in April 2011?
Recovering from Amazon cloud outage by Drew Engelson of PBS.
- PBS was affected for a while primarily because we do use EBS-backed RDS databases. Despite being spread across multiple availability-zones, we weren’t easily able to launch new resources ANYWHERE in the East region since everyone else was trying to do the same. I ended up pushing the RDS stuff out West for the time being. From Comment

Amazon Web Services Discussion Forum

A fascinating peek into the experiences of people who were dealing with the outage while they were experiencing it. Great real-time social archeology in action.

There were also many many instances of support and help in the log.

In Summary

Amazon EC2 outage: summary and lessons learned by RightScale
AWS outage timeline & downtimes by recovery strategy by Eric Kidd
The Aftermath of Amazon’s Cloud Outage by Rich Miller

Taking Sides: It's The Customer's Fault

So Your AWS-based Application is Down? Don’t Blame Amazon by The Storage Architect
The Cloud is not a Silver Bullet by Joe Stump (Hacker News thread)
The AWS Outage: The Cloud's Shining Moment by George Reese (Hacker News discussion)
Failing to Plan is Planning to Fail by Ted Theodoropoulos
Get a life and build redundancy/resiliency in your apps on the Cloud Computing group

Taking Sides: It's Amazon's Fault

Stop Blaming the Customers - the Fault is on Amazon Web Services by Klint Finley
AWS is down: Why the sky is falling by Justin Santa Barbara (Hacker News thread)
Amazon Web Services are down - Huge Hacker News thread

Lessons Learned And Other Insight Articles

Amazon’s EBS outage by Robin Harris of StorageMojo
People Using Amazon Cloud: Get Some Cheap Insurance At Least by Bob Warfield
Basic scalability principles to avert downtime by Ronald Bradford
Amazon crash reveals 'cloud' computing actually based on data centers by Kevin Fogarty
Seven lessons to learn from Amazon's outage By Phil Wainewright
The Cloud and Outages : Five Key Lessons by Patrick Baillie (Cloud Computing Group discussion)
Some thoughts on outages by Till Klampaeckel
Amazon.com’s real problem isn’t the outage, it’s the communication by Keith Smith
How to work around Amazon EC2 outages by James Cohen (Hacker News thread)
Today’s EC2 / EBS Outage: Lessons learned on Agile Sysadmin
Amazon EC2 has gone down -what would a prefered hosting platform be? on Focus
Single Points of Failure by Mat
Coping with Cloud Downtime with Puppet
Amazon Outage Concerns Are Overblown by Tim Crawford
Where There Are Clouds, It Sometimes Rains by Clay Loveless
Availability, redundancy, failover and data backups at LearnBoost by Guillermo Rauch
Cloud hosting vs colocation by Chris Chandler (Hacker News thread)
Amazon’s EC2 & EBS outage by Arnon Rotem-Gal-Oz
Complex Systems Have Complex Failures. That’s Cloud Computing by Greg Ferro
Amazon Web Services, Hosting in the Cloud and Configuration Management by Ian Chilton
Lessons learned from deploying a production database in EC2 by by Grig Gheorghiu of Agile Testing
Bezos on Amazon as a technology and invention company by John Gruber on Daring Fireball.

Vendor's Vent

Amazon Outage Proves Value of Riak’s Vision by Basho
Magical Block Store: When Abstractions Fail Us by Mark Joyent (Hacker News discussion)
On Cascading Failures and Amazon’s Elastic Block Store by Jason
An unofficial EC2 outage postmortem - the sky is not falling from CloudHarmony
Cloudfail: Lessons Learned from AWS Outage by Jyoti Bansal

Update on Friday, April 29, 2011 at 8:27AM by

Todd Hoff

Summary Of The Amazon EC2 And Amazon RDS Service Disruption In The US East Region

A network change was performed as part of our normal AWS scaling activities in a single Availability Zone in the US East Region. The configuration change was to upgrade the capacity of the primary network. During the change, one of the standard steps is to shift traffic off of one of the redundant routers in the primary EBS network to allow the upgrade to happen. The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network.
When this network connectivity issue occurred, a large number of EBS nodes in a single EBS cluster lost connection to their replicas. When the incorrect traffic shift was rolled back and network connectivity was restored, these nodes rapidly began searching the EBS cluster for available server space where they could re-mirror data. Once again, in a normally functioning cluster, this occurs in milliseconds. In this case, because the issue affected such a large number of volumes concurrently, the free capacity of the EBS cluster was quickly exhausted, leaving many of the nodes “stuck” in a loop, continuously searching the cluster for free space. This quickly led to a “re-mirroring storm,” where a large number of volumes were effectively “stuck” while the nodes searched the cluster for the storage space it needed for its new replica. At this point, about 13% of the volumes in the affected Availability Zone were in this “stuck” state.

Tuesday, 14 December 2010

Big Just Got Bigger - 5 Terabyte Object Support in Amazon S3

Big Just Got Bigger - 5 Terabyte Object Support in Amazon S3: "

Today, Amazon S3 announced a new breakthrough in supporting customers with large files by increasing the maximum supported object size from 5 gigabytes to 5 terabytes. This allows customers to store and reference a large file as a single object instead of smaller 'chunks'. When combined with the Amazon S3 Multipart Upload release, this dramatically improves how customers upload, store and share large files on Amazon S3.

Who has files larger than 5GB?

Amazon S3 has always been a scalable, durable and available data repository for almost any customer workload. However, as use of the cloud as grown, so have the file sizes customers want to store in Amazon S3 as objects. This is especially true for customers managing HD video or data-intensive instruments such as genomic sequencers. For example, a 2-hour movie on Blu-ray can be 50 gigabytes. The same movie stored in an uncompressed 1080p HD format is around 1.5 terabytes.

By supporting such large object sizes, Amazon S3 better enables a variety of interesting big data use cases. For example, a movie studio can now store and manage their entire catalog of high definition origin files on Amazon S3 as individual objects. Any movie or collection of content could be easily pulled in to Amazon EC2 for transcoding on demand and moved back into Amazon S3 for distribution through edge locations throughout the word with Amazon CloudFront. Or, BioPharma researchers and scientists can stream genomic sequencer data directly into Amazon S3, which frees up local resources and allows scientists to store, aggregate, and share human genomes as single objects in Amazon S3. Any researcher anywhere in the world then has access to a vast genomic data set with the on-demand compute power for analysis, such as Amazon EC2 Cluster GPU Instances, previously only available to the largest research institutions and companies.

Multipart Upload and moving large objects into Amazon S3

To make uploading large objects easier, Amazon S3 also recently announced Multipart Upload, which allows you to upload an object in parts. You can create parallel uploads to better utilize your available bandwidth and even stream data into Amazon S3 as it's being created. Also, if a given upload runs into a networking issue, you only have to restart the part, not the entire object allowing you recover quickly from intermittent network errors.

Multipart Upload isn't just for customers with files larger than 5 gigabytes. With Multipart Upload, you can upload any object larger than 5 megabytes in parts. So, we expect customers with objects larger than 100 megabytes to extensively use Multipart Upload when moving their data into Amazon S3 for a faster, more flexible upload experience.

More information

For more information on Multipart Upload and managing large objects in Amazon S3, see Jeff Barr's blog posts on Amazon S3 Multipart Upload and Large Object Support as well as the Amazon S3 Developer Guide.

Amazon Route 53 DNS Service

Amazon Route 53 DNS Service: "

Even working in Amazon
Web Services,
I’m finding the frequency of new product announcements and updates a bit dizzying.
It’s amazing how fast the cloud is taking shape and the feature set is filling out.
Utility computing has really been on fire over the last 9 months. I’ve never seen
an entire new industry created and come fully to life this fast. Fun times.

Before joining AWS, I used to say that I had
an inside line on what AWS was working upon and what new features were coming in the
near future. My trick? I went to AWS customer meetings and just listened. AWS
delivers what customers are asking for with such regularity that it’s really not all
that hard to predict new product features soon to be delivered. This trend continues
with today’s announcement. Customers have been asking for a Domain
Name Service with consistency and,
today, AWS is announcing the availability of Route
53, a scalable, highly-redundant
and reliable, global DNS service.

The Domain
Name System is essentially a global,
distributed database that allows various pieces of information to be associated with
a domain name. In the most common case, DNS is used to look up the numeric IP
address for an domain name. So, for example, I just looked up Amazon.com and
found that one of the addresses being used to host Amazon.com is 207.171.166.252.
And, when your browser accessed this blog (assuming you came here directly rather
than using RSS) it would have looked up perspectives.mvdirona.com to
get an IP address. This mapping is stored in an DNS “A” (address) record. Other popular
DNS records are CNAME (canonical name), MX (mail exchange), and SPF (Sender Policy
Framework). A full list of DNS record types is at: http://en.wikipedia.org/wiki/List_of_DNS_record_types.
Route 53 currently supports:

• A
(address record)

• AAAA
(IPv6 address record)

• CNAME
(canonical name record)

• MX
(mail exchange record)

• NS
(name server record)

• PTR
(pointer record)

• SOA
(start of authority record)

• SPF
(sender policy framework)

• SRV
(service locator)

• TXT
(text record)

DNS, on the surface, is fairly
simple and is easy to understand. What is difficult with DNS is providing absolute
rock-solid stability at scales ranging from a request per day on some domains to billions
on others. Running DNS rock-solid, low-latency, and highly reliable is hard. And
it’s just the kind of problem that loves scale. Scale allows more investment in the
underlying service and supports a wide, many-datacenter footprint.

The AWS Route 53 Service is hosted
in a global network of edge locations including the
following 16 facilities:

· United
States

• Ashburn,
VA

• Dallas/Fort
Worth, TX

• Los
Angeles, CA

• Miami,
FL

• New
York, NY

• Newark,
NJ

• Palo
Alto, CA

• Seattle,
WA

• St.
Louis, MO

· Europe

• Amsterdam

• Dublin

• Frankfurt

• London

· Asia

• Hong
Kong

• Tokyo

• Singapore

Many DNS lookups are resolved in local caches
but, when there is a cache miss, it will need to be routed back to the authoritative
name server. The right approach
to answering these requests with low latency is to route to the nearest datacenter
hosting an appropriate DNS server. In Route 53 this is done using anycast.
Anycast is a cool routing trick where the same IP address range is advertised to be
at many different locations. Using this technique, the same IP address range is advertized
as being in each of the world-wide fleet of datacenters. This results in the request
being routed to the nearest facility from a network perspective.

Route 53 routes to the nearest
datacenter to deliver low-latency, reliable results. This is good but Route 53 is
not the only DNS service that is well implemented over a globally distributed fleet
of datacenters. What makes Route 53 unique is it’s a cloud service. Cloud means the
price is advertised rather than negotiated. Cloud means you make an API call
rather than talking to a sales representative. Cloud means it’s a simple API and you
don’t need professional services or a customer support contact. And cloud means its
running NOW rather than tomorrow morning when the administration team comes in. Offering
a rock-solid service is half the battle but it’s the cloud aspects of Route 53 that
are most interesting.

Route 53 pricing is advertised
and available to all:

· Hosted
Zones: $1 per hosted zone per month

· Requests:
$0.50 per million queries for first billion queries and $0.25 per million queries
over 1B month

You can have it running in less time than
it took to read this posting. Go to: ROUTE
53 Details. You don’t
need to talk to anyone, negotiate a volume discount, hire a professional service team,
call the customer support group, or wait until tomorrow. Make the API calls to set
it up and, on average, 60 seconds later you are fully operating.

--jrh

James
Hamilton

e: jrh@mvdirona.com

w: http://www.mvdirona.com

b: http://blog.mvdirona.com / http://perspectives.mvdirona.com

From Perspectives."

Thursday, 9 December 2010

Expanding the Cloud with DNS - Introducing Amazon Route 53

Expanding the Cloud with DNS - Introducing Amazon Route 53: "I am very excited that today we have launched Amazon Route 53, a high-performance and highly-available Domain Name System (DNS) service. DNS is one of the fundamental building blocks of internet applications and was high on the wish list of our customers for some time already. Route 53 has the business properties that you have come to expect from an AWS service: fully self-service and programmable, with transparent pay-as-you-go pricing and no minimum usage commitments.

Some fundamentals on Naming

Naming is one of the fundamental concepts in Distributed Systems. Entities in a system are identified through their name, which is separate from the way that you would choose to access that entity, the address that the access point resides at and what route to take to get to that address.

A simple example is the situation with Persons and Telephones; a person has a name, a person can have one or more telephones and each phone can have one or more telephone numbers. To reach an individual you will look up him or her in your address book, and select a phone (home, work, mobile) and then a number to dial. The number will be used to route the call through the myriad of switches to its destination. The person is the entity with its name, the phones are access points and the phones numbers are addresses.

Names do not necessarily need to be unique, but it makes life a lot easier if that is the case. There is more than one Werner Vogels in this world and although I never get emails, snail mail or phones calls for any of my peers, I am sure they are somewhat frustrated if they type in our name in a search engine :-).

In distributed systems we use namespaces to ensure that we can create rich naming without having to continuously worry about whether these names are indeed globally unique. Often these namespaces are hierarchical in nature such that it becomes easier to manage them and to decentralize control, which makes the system more scalable.
The naming system that we are all most familiar with in the internet is the Domain Name System (DNS) that manages the naming of the many different entities in our global network; its most common use is to map a name to an IP address, but it also provides facilities for aliases, finding mail servers, managing security keys, and much more. The DNS namespace is hierarchical in nature and managed by organizations called registries in different countries. Domain registrars are the commercial interface between the DNS registries and those wishing to manage their own namespace.

DNS is an absolutely critical piece of the internet infrastructure. If it is down or does not function correctly, almost everything breaks down. It would not be a first that a customer thinks that his EC2 instance is down when in reality it is some name server somewhere that is not functioning correctly.

DNS looks relatively simple on the outside, but is pretty complex on the inside. To ensure that this critical component of the internet scales and is robust in the face of outages, replication is used pervasively using epidemic style techniques. The DNS is one of those systems that rely on Eventual Consistency to manage its globally replicated state.

While registrars manage the namespace in the DNS naming architecture, DNS servers are used to provide the mapping between names and the addresses used to identify an access point. There are two main types of DNS servers: authoritative servers and caching resolvers. Authoritative servers hold the definitive mappings. Authoritative servers are connected to each other in a top down hierarchy, delegating responsibility to each other for different parts of the namespace. This provides the decentralized control needed to scale the DNS namespace.

But the real robustness of the DNS system comes through the way lookups are handled, which is what caching resolvers do. Resolvers operate in a completely separate hierarchy which is bottoms up, starting with software caches in a browser or the OS, to a local resolver or a regional resolver operated by an ISP or a corporate IT service. Caching resolvers are able to find the right authoritative server to answer any question, and then use eventual consistency to cache the result. Caching techniques ensure that the DNS system doesn't get overloaded with queries.

The Domain Name System is a wonderful practical piece of technology; it is a fundamental building block of our modern internet. As always there are many improvements possible, and many in the area of security and robustness are always in progress.

Amazon Route 53

Amazon Route 53 is a new service in the Amazon Web Services suite that manages DNS names and answers DNS queries. Route 53 provides Authoritative DNS functionality implemented using a world-wide network of highly-available DNS servers.
Amazon Route 53 sets itself apart from other DNS services that are being offered in several ways:

A familiar cloud business model: A complete self-service environment with no sales people in the loop. No upfront commitments are necessary and you only pay for what you have used. The pricing is transparent and no bundling is required and no overage fees are charged.

Very fast update propagation times: One of the difficulties with many of the existing DNS services are the very long update propagation times, sometimes it may even take up to 24 hours before updates are received at all replicas. Modern systems require much faster update propagation to for example deal with outages. We have designed Route 53 to propagate updates very quickly and give the customer the tools to find out when all changes have been propagated.

Low-latency query resolution The query resolution functionality of Route 53 is based on anycast, which will route the request automatically to the DNS server that is the closest. This achieves very low-latency for queries which is crucial for the overall performance of internet applications. Anycast is also very robust in the presence of network or server failures as requests are automatically routed to the next closest server.

No lock-in. While we have made sure that Route 53 works really well with other Amazon services such as Amazon EC2 and Amazon S3, it is not restricted to using it within AWS. You can use Route 53 with any of the resources and entities that you want to control, whether they are in the cloud or on premise.

We chose the name 'Route 53' as a play on the fact that DNS servers respond to queries on port 53. But in the future we plan for Route 53 to also give you greater control over the final aspect of distributed system naming, the route your users take to reach an endpoint. If you want to learn more about Route 53 visit http://aws.amazon.com/route53 and read the blog post at the AWS Developer weblog.

Tuesday, 17 August 2010

Scaling an AWS infrastructure - Tools and Patterns

Scaling an AWS infrastructure - Tools and Patterns: "

This is a guest post by Frédéric Faure (architect at Ysance), you can follow him on twitter.

How do you scale an AWS (Amazon Web Services) infrastructure? This article will give you a detailed reply in two parts: the tools you can use to make the most of Amazon’s dynamic approach, and the architectural model you should adopt for a scalable infrastructure.

I base my report on my experience gained in several AWS production projects in casual gaming (Facebook), e-commerce infrastructures and within the mainstream GIS (Geographic Information System). It’s true that my experience in gaming (IsCool, The Game) is currently the most representative in terms of scalability, due to the number of users (over 800 thousand DAU – daily active users – at peak usage and over 20 million page views every day), however my experiences in e-commerce and GIS (currently underway) provide a different view of scalability, taking into account the various problems of availability and data management. I will therefore attempt to provide a detailed overview of the factors to take into account in order to optimise the dynamic nature of an infrastructure constructed in a Cloud Computing environment, and in this case, in the AWS environment.