Stories

Ina Fried Mar 2, 2017
SaveSave story

Amazon blames human error for Tuesday's massive AWS outage

Canonicalized / Flickr cc

Amazon offered up an explanation for this week's major outage in its AWS service, blaming human error and promising to change its procedures to avoid a similar incident.

The outage, Amazon said, was caused while trying to remove a small number of servers. A command was incorrectly entered by an employee, resulting in more servers than expected to be taken offline and requiring the overall system to be fully rebooted.

We want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.

Amazon said it is making a number of changes, including safeguards that should prevent too many servers from being taken offline.