The scale of the Amazon Web Services (AWS) outage was due to the 'us-east-1' region becoming an SPOF.


by Tony Webster from Minneapolis, Minnesota, United States

A large-scale outage occurred in the Amazon Web Services (AWS) us-east (US East)-1 region at approximately 3:49 PM on Monday, October 20, 2025 (Japan time). This disruption affected many services, including Amazon, apps and online services around the world that host data on cloud-based servers, and online gaming systems like Nintendo's Nintendo Switch Online. At the time of writing, the outage had already been resolved, and efforts to identify the cause are underway.

Service health - Oct 21, 2025 | AWS Health Dashboard | Global
https://health.aws.amazon.com/health/status?ts=20251020

Amazon Web Services event: Follow live updates on AWS Health Dashboard
https://www.aboutamazon.com/news/aws/aws-service-disruptions-outage-update

Amazon's us-east-1 region is located in Data Center Alley, the world's largest data center cluster, in Lowdown County, Northern Virginia, USA. The concentration of data centers in this region is due to its proximity to the federal government and tax incentives, particularly a system that provides sales tax exemptions for equipment and software on the condition of a certain investment amount and employment, which has supported the rapid expansion of companies such as AWS.

According to AWS, the error rate and latency in the us-east-1 region increased between 3:49 PM and 6:24 PM Japan time on October 20th. The outage affected Amazon.com and its subsidiaries, as well as the case creation function in the AWS Support Center, making it impossible for customers to create support tickets. The outage also caused outages for Amazon, Amazon Prime Video, Amazon's security camera service 'Ring,' and Amazon's AI assistant 'Alexa.'



In addition, all services that use AWS databases were affected by the outage and were temporarily suspended, including the Apple App Store.



The official British government website was also unavailable.



Lloyds Bank, a major British financial institution, also appears to have been affected by the outage and had most of its services go down.



It has also been confirmed that Amazon Pay is no longer available with Japan Post's ClickPost.

Click Post | Japan Post
https://clickpost.jp/

2025/10/20 Regarding the situation where payments cannot be made with Amazon Pay
Since around 4:15 PM on Monday, October 20th, we have been experiencing an issue where payment procedures cannot be completed using Amazon Pay.
We sincerely apologize for any inconvenience caused to our users.
We are currently investigating this matter and will update you here as soon as there is progress or recovery.



Apparently, it has become impossible to log in to the game 'Fortnite.'



PalWorld is experiencing problems with multiplayer connections.



Nintendo has stopped all network services.



FromSoftware's 'ELDEN RING NIGHTREIGN' was also influenced by it.



Cloudflare Radar reports that the outage in the AWS us-east-1 region caused traffic to drop by approximately 71% compared to normal.



Meanwhile, X (formerly Twitter) avoided the outage. X's owner, Elon Musk, posted, 'X is working.'



Furthermore, '(X Chat) messages are fully encrypted, there are no ads, no weird 'AWS dependencies', and even if someone put a gun to my head, they can't read your messages,' he said, taking advantage of the opportunity to appeal to X Chat.



In addition, the financial newspaper Bloomberg has its own data center, so it was not affected by the outage.



At 5:26 PM on October 20th, the source of the issue was identified as a DNS resolution issue for the DynamoDB regional endpoint. AWS engineers simultaneously worked on multiple routes to resolve the issue, and by 6:24 PM, the DNS issue had been resolved, at which point all services began to recover.

However, even after the DNS issue was resolved, the internal EC2 subsystem that relies on DynamoDB remained affected, causing new instance launch failures. This in turn disrupted the Network Load Balancer (NLB) health check mechanism, causing widespread impacts to network connectivity for Lambda, DynamoDB, CloudWatch, and more. To ensure a smooth recovery, AWS has temporarily restricted some operations, including new EC2 launches, Lambda event processing via SQS queues, and asynchronous Lambda invocations.

At 5:38 PM, the Network Load Balancer's health check mechanism was restored, improving network connectivity. Between 11 PM and 4 AM on the 21st, the success rate of EC2 instance launches increased, and Lambda function invocation errors were gradually resolved. Between 4 AM and 6 AM on the 21st, EC2-dependent services such as Redshift, ECS, and Glue also began to return to normal, and restrictions were gradually lifted.

All AWS services returned to normal operation at 7:01 AM on October 21. Some services, such as AWS Config, Redshift, and Connect, continued to process backlogs for several hours, but overall, the service was considered fully operational.

Amazon stated that 'the primary cause of this outage was a DNS resolution error for the DynamoDB service endpoint in the us-east-1 region,' which then propagated through dependencies to the EC2 internal subsystem and the network load balancer health check mechanism. AWS has clearly described this issue as a 'DNS resolution issue for regional DynamoDB service endpoints,' and will publish a detailed post-event summary in the future.

The cause of this outage was that, although AWS has many data centers around the world, many companies had designed their systems to use the us-east-1 region by default, making the us-east-1 region an SPOF .

us-east-1 is the oldest region, having been in operation since the early days of AWS, and the core infrastructure of the entire Internet, such as Route 53 and CloudFront, also passes through this region. Therefore, even if some customers specify other regions, they often indirectly use communications via us-east-1. It can be said that the AWS us-east-1 region has become the 'virtual nerve center of the Internet,' and even a short-term failure can cause chaos on a global scale.

GIGAZINE's main server is located in our own data center, so we were not affected by the AWS outage. If you donate to cover the cost of running the server, we can build a more solid backup system and respond to global outages like this, so please support us!

About GIGAZINE
https://gigazine.net/news/about/



・Continued
The cause of the AWS outage was a design flaw in the DNS management system - GIGAZINE

in Web Service,   Security, Posted by log1i_yk