The Challenge
A growing e-commerce platform was struggling with on-premise infrastructure that couldn't handle peak traffic during sales events.
What was going wrong
When we first spoke to this retailer, their CTO didn’t sugarcoat it: “Last Black Friday broke us.” Their platform had gone down for just over three hours at peak trading time, costing them an estimated £1.2 million in lost revenue. Customer complaints flooded social media, and their engineering team spent the following weekend doing manual recovery work.
The root cause was simple enough. They were running bare-metal servers in a London colocation facility, provisioned for average load rather than peak. Their infrastructure had no elasticity whatsoever. When traffic surged to 15x normal levels during the sale, their servers buckled, the database connection pool saturated, and the whole platform fell over.
They had a small but capable engineering team of six. What they lacked was not talent but time and headroom. Every sprint was consumed by firefighting infrastructure issues rather than building product features. They needed to get off bare metal, but they needed a partner to make it happen without disrupting the business.
What we did
We proposed a phased migration to AWS, designed around the team’s capacity and a hard deadline: everything had to be stable well before the next Black Friday.
We chose ECS over EKS deliberately. With a team of six, introducing Kubernetes would have meant months of upskilling and a significant ongoing operational burden. ECS gave us managed container orchestration without the complexity tax. It was the pragmatic call, and the right one.
The migration ran in three phases. First, we moved the static assets and product images behind CloudFront, which gave an immediate performance win with minimal risk. Second, we containerised the application layer and deployed it on ECS with auto-scaling groups configured to handle 20x normal traffic. Third, we migrated the database to Aurora PostgreSQL with read replicas to distribute query load during peak periods.
During the database migration, we hit a surprise. Their existing application had no connection pooling at all. Every request opened a new database connection. On the old bare-metal setup with limited traffic, this had never surfaced as a problem. On AWS, with auto-scaling spinning up dozens of containers under load, it would have exhausted Aurora’s connection limit within minutes. We introduced PgBouncer as a connection pooler, which resolved the issue cleanly. Without catching this, the migration would have simply moved the same failure mode to more expensive hardware.
The DNS cutover required careful timing. We ran the new and old environments in parallel for two weeks, using weighted routing to shift traffic gradually. The final switchover happened on a Tuesday morning in October, giving us a full month of production running time before Black Friday.
What changed
Here is what changed:
- 40% reduction in monthly infrastructure spend, even accounting for the elasticity headroom.
- Zero customer-facing downtime during Black Friday and Christmas, up from 94% availability the previous year. One minor background job hiccup on Cyber Monday, invisible to shoppers.
- 3x faster page load times thanks to CloudFront edge caching and optimised asset delivery.
- Deploy frequency increased from weekly to multiple times daily, with blue-green deployments via CodeDeploy giving the team confidence to ship without fear.
The engineering team got their weekends back. More importantly, the CTO told us that for the first time in three years, he did not open his laptop on Black Friday morning to check if the site was still up. It was.