The bill that nobody can explain
Most teams I meet have an AWS bill that climbs faster than their user count, and a story about why that is. The story is usually about scale. The reality is usually about code. If you want to reduce AWS costs without a migration, the place to start is rarely AWS at all.
I joined Cuez, a Belgian SaaS product built by Tinkerlist, while their bill was on exactly that trajectory. Their team had already done the obvious thing: bigger instances, more memory, a larger database. The bill kept rising and the API kept timing out. After about four months of focused engineering work, the API went from 3-second average response to roughly 300ms, and the AWS bill landed near 60% of where it started. That is the full Cuez case study.
This article is the unglamorous version of how that happened. No new vendor. No expensive DevOps hire. Just reading the code and fixing what was wrong.
According to the Flexera 2024 State of the Cloud report, organizations estimate roughly 27% of their cloud spend is wasted (Flexera 2024 State of the Cloud). My experience across 250+ projects since 2009 puts that number higher in the typical SaaS, somewhere closer to 30 to 50%.
TL;DR
- Cuez was burning AWS resources because of inefficient code, not because the platform needed bigger servers.
- I audited the codebase, found unnecessary database queries, dead dependencies, and missing caching, then fixed them in order of impact.
- API response dropped from 3 seconds to 300ms. Infrastructure cost fell about 40%.
- The whole engagement took about four months. No migration, no new services, no new team members.
- If your bill keeps growing faster than your users, your problem is almost certainly application-level waste.
Table of contents
- The client and the problem
- Why your AWS bill is probably a code problem
- How I diagnosed the waste
- The four fixes that saved 40%
- The results
- How to tell if you are overpaying
- When to bring in outside help
- FAQ
- Reflecting on the bill that nobody could explain
The client and the problem
Cuez is a B2B SaaS platform for teams running television shows and live events. The product helps producers manage rundowns, coordinate crews, and stay synchronized in real time. Live events do not wait for spinners, so the platform has a hard speed requirement.
When I joined in April 2021, the product worked. Users could create shows, manage rundowns, coordinate crews. It was just slow. Pages took several seconds. The Laravel API averaged 3-second response times. The team had already tried the standard escalation: bigger EC2 instances, more memory, a beefier RDS database. The bill kept climbing. Performance barely moved.
This pattern is everywhere. When an application is slow, the instinct is to buy more hardware. The trouble is that if your code is making the database do unnecessary work on every request, a bigger database just does that unnecessary work slightly faster. You are paying more to be inefficient at higher resolution.
For background on why response time matters beyond the AWS bill, I covered the conversion math in API response time: how I made it 10x faster and the broader picture in website speed optimization.
Why your AWS bill is probably a code problem
Here is what most cloud consultants will not say plainly: for SaaS applications under roughly 10,000 requests per minute, the infrastructure is usually fine. The code is the bottleneck.
AWS charges you for compute time, data transfer, database operations, and storage. If the application runs queries it does not need, loads libraries it never uses, or processes data it later throws away, you are paying for waste. Reserved instances and savings plans clip the price tag by 20 to 30% but do nothing about the underlying inefficiency.
I have seen this across 250+ projects in 16 years of building software. The team assumes infrastructure. A vendor recommends reserved instances or a managed service migration. Sometimes that helps. But if the root cause is application-level waste, you are optimizing the wrong layer. The Cuez codebase made this painfully clear. Their database CPU sat above 80% during normal traffic. Every fix that did not touch the code left it there.
If you only remember one thing from this article, make it this. Fix the application before you migrate the infrastructure. Almost always, in that order.
How I diagnosed the waste
There was no dashboard with a glowing red button. I read the code.
Step 1: Full codebase audit
I went through the Laravel application file by file, looking for four specific patterns:
- Database queries that ran on every request without needing to. A single query at 50ms feels harmless. At 1,000 requests per minute, that is 50 seconds of wasted database time every minute, on one query.
- Dependencies that were no longer used or had been superseded by built-in framework features. Each one added cold-start time and memory.
- Custom code that reimplemented things Laravel already did natively, usually slower and with more bugs.
- Missing caching on data that rarely changed.
Step 2: Profiling the hot paths
Not every endpoint is worth optimizing. I used Laravel's query log and Telescope to identify the slowest endpoints and the ones consuming the most database time. The familiar 80/20 pattern showed up: roughly 20% of endpoints accounted for 80% of database load.
The main rundown endpoint, the heart of the product, was the worst offender. It loaded show data, every segment, every piece of media, every permission, every collaboration state, in over 40 separate queries. Most of those fetched columns the frontend never used.
Step 3: Mapping waste to dollars
This is the step most engineers skip. I priced each inefficiency in monthly AWS spend so I could rank fixes by financial impact, not by how technically interesting they were. A query that wasted $500 of compute per month got fixed before one that wasted $20, even if the second was a more elegant problem.
The four fixes that saved 40%
Fix 1: Query optimization
This was the largest single contribution. I rewrote the most expensive queries to fetch only what the application actually used.
The original code over-used eager loading. It pulled entire related datasets when the frontend needed three or four fields. The library analogy works here. Asking the librarian for every book in the building when you only need the titles on one shelf is technically correct and absolutely wasteful.
The changes:
- Select only the columns the frontend reads.
- Add database indexes on the columns used in
WHEREandJOINclauses. - Combine multiple small queries into one efficient query where the data model allowed.
- Remove queries whose results were never returned to the user.
On the main rundown endpoint, this dropped the query count from over 40 per request to about 12. That alone took average response time from roughly 3 seconds toward the 1-second range, with the rest of the gain coming from the next three fixes.
Fix 2: Real caching
Some data changes rarely. User permissions, show configurations, media metadata. Cuez was fetching all of this from Postgres on every request. The first cache layer was a 5-minute Redis TTL on the slowest cold paths. ElastiCache Redis is cheap relative to RDS Postgres, and the swap is direct.
For an explanation of why this matters in plain language, see how database queries slow down your web app, which covers caching, indexing, and N+1 in more detail.
The result, measured against the database, was about 80% of read traffic served from Redis. RDS CPU dropped from the 80%+ range to around 30%. That is what enabled stepping down the RDS instance class without losing headroom, which contributed directly to the bill.
Fix 3: Removing dead code and outdated dependencies
Years of development had left the Cuez codebase with unused npm and Composer packages, abandoned experiments, and custom implementations of things Laravel already did. I removed every dependency the application did not actively need and replaced custom code with framework primitives.
The visible effect was a smaller memory footprint per worker. On AWS, that translated to fitting more workers per EC2 instance, or stepping down to smaller instance types. Either way, it reduced cost.
Fix 4: Framework upgrade
Cuez was running an older Laravel. I upgraded to Laravel 10, which had measurable improvements in query builder performance, request lifecycle, and connection pooling. Every request benefited.
I also moved the frontend from Vue 2 to Vue 3. Vue 3 ships a smaller bundle and renders faster. Less JavaScript over the wire means lower CloudFront and bandwidth charges, plus less work for the user's browser.
The full case write-up is at Cuez API optimization. For the broader context of what this kind of engagement looks like commercially, see my Custom Web Applications service starting at $3,499/mo and the Fractional CTO service starting at $4,500/mo.
The results
After about four months of focused work:
| Metric | Before | After | Change |
|---|---|---|---|
| Average API response time | 3,000ms | 300ms | 10x faster |
| Database queries per request, main endpoint | 40+ | ~12 | 70% fewer |
| Monthly AWS infrastructure cost | Baseline | ~60% of baseline | ~40% reduction |
| Application memory per worker | High | Reduced | Smaller instances viable |
| User-facing page load | Several seconds | Sub-second | Visibly different product |
The 40% bill reduction was a sum of smaller wins. Fewer database operations meant a smaller RDS instance was enough. Lower memory per worker meant fewer or smaller EC2 instances. Lighter API responses and a smaller frontend bundle reduced CloudFront and data-transfer charges.
The unexpected part was the business effect. The product team reported user engagement going up after the speed work. Features users had been avoiding because of slow loads started getting used. Sales calls stopped having awkward 3-second gaps. A faster product is not just cheaper to run. It is also easier to sell.
How to tell if you are overpaying
You do not need to hire me to know whether you have this problem. Five signs:
1. Your bill grows faster than your user count. If users grew 20% but the AWS bill grew 60%, something is scaling poorly. Healthy applications have roughly linear cost-to-traffic curves.
2. Bigger instances did not help. If you stepped up RDS or EC2 and response times barely moved, the bottleneck is the application.
3. Your database CPU stays above 70%. Sustained high database CPU almost always points to inefficient queries. A reasonably optimized application keeps it under 40% during normal traffic.
4. Nobody on the team can explain the cost line by line. If your engineers cannot point at a service and say "this costs X because of Y," the waste is hiding in plain sight. AWS Cost Explorer helps but only if someone reads it.
5. You have not had a performance audit in over a year. Code accumulates inefficiencies. New features ship under deadline pressure. Quick fixes become permanent. Without periodic review, waste compounds.
If three or more of these sound familiar, you likely have a real optimization opportunity. In my experience, SaaS applications that have never been audited typically have 20 to 40% of cloud spend on the table, sometimes more.
For deeper symptom-spotting on the database side specifically, how database queries slow down your web app goes into the diagnostic patterns I use.
When to bring in outside help
You might be wondering whether your team should just do this. Sometimes yes. A few honest reasons it often goes faster with outside help:
Fresh eyes catch what familiarity hides. Your team wrote the code. They have context, which is useful, and blind spots, which are not. They will not question patterns they implemented six months ago.
Optimization needs uninterrupted focus. Your engineers are shipping features and putting out fires. Refactoring a hot path well requires several uninterrupted days, which is a luxury most product teams do not have.
The math is usually obvious. If your AWS bill is $10,000 per month and an audit cuts it 30%, that is $3,000 per month, $36,000 per year. A focused engagement typically pays for itself inside the first quarter.
This is most of what I do as a Fractional CTO and through my Custom Web Applications work. I come in, audit the system, fix what is costing money, and leave the team with practices that keep the waste from coming back.
If your situation looks like the one in this article, book a free strategy call or get a quote in 60s.
FAQ
How long does a cloud cost optimization project take?
For a typical SaaS with a single API and database, the audit is 2 to 4 weeks and the fixes 4 to 8 weeks. Cuez was about four months total because we also did the framework upgrade. If your codebase is larger or has multiple services, plan for longer.
Can I reduce AWS costs without changing my code?
Partially. Reserved instances, savings plans, right-sizing, and Compute Optimizer save 20 to 30% on most workloads. The biggest savings come from making the application use fewer resources per request, which requires code changes. The two work best together.
How much can I realistically save?
For applications never optimized, 20 to 40% is common. I have seen 60% on extreme cases with years of accumulated waste. If you have already done a recent audit, the remaining gains are smaller, usually 10 to 15%.
Will optimization break my application?
It can if done carelessly. Every change at Cuez went through code review, automated tests, and staged rollouts. Query rewrites were verified against production data patterns before going live. Caching was implemented with explicit invalidation paths so users never saw stale data. The risk is real and manageable with normal engineering hygiene.
Should I optimize or migrate to a different cloud provider?
Optimize first, almost always. Inefficient code is inefficient on every cloud. Google Cloud and Azure are not meaningfully cheaper than AWS for most workloads. Fix the application, then evaluate a migration on its own merits, like specific managed services or geographic requirements.
Would moving to Lambda reduce costs?
Maybe, with caveats. Serverless charges per execution rather than per hour, so it can be cheaper for variable traffic. But Lambda has its own traps. Inefficient code costs more on Lambda because every invocation runs longer. Fix the code first, then evaluate serverless honestly.
Do I need a DevOps engineer or a software engineer for this?
For Cuez-style work, you mostly need a software engineer who knows the application framework deeply. A DevOps engineer can right-size infrastructure and set up monitoring, but they will not refactor your queries or upgrade Laravel. Ideally one person can do both, which is part of why a Fractional CTO engagement fits this kind of project.
What stack expertise applies here?
The Cuez engagement was Laravel and Vue.js with AWS. My core stack across 250+ projects is PHP, JavaScript, TypeScript, Node.js, React, Vue.js, Next.js, NestJS, Postgres, MySQL, MongoDB, Redis, AWS, Docker, and Kubernetes. The same diagnostic process applies to any of these.
Reflecting on the bill that nobody could explain
The team at Cuez had been told they needed to scale up. The honest answer was that they needed to slow down and read the code. Four months later the API was 10x faster, the bill was 40% smaller, and the team had a set of practices that kept waste from creeping back.
If your AWS bill is the one nobody on your team can explain, the next step is not a vendor switch. It is a quiet audit, ranked by dollar impact, executed in order. If your team has the bandwidth, do it yourself. If they do not, let's talk about what an audit would look like for your stack. I take 2 to 3 clients at a time, and I respond within 24 hours.
For related reading, see how I optimized API response time, the database queries deep dive, and the case write-ups for Cuez, Imohub, and bolttech.