How long before we see latency improvements?

First wins typically ship within 3–4 weeks of engagement start. The first 2 weeks are baseline measurement and root-cause analysis; week 3 onwards ships optimizations. Typical first-round wins: 30–50% P95 reduction on targeted endpoints. Larger wins (5–10x) take longer and require architectural change.

Will you refactor or rebuild?

Decided in week 1 based on the 7-factor framework. Most often: targeted refactor of 3–5 bottleneck endpoints, leaving the rest alone. Occasionally: extraction of specific services via strangler fig. Full rebuild is rare and only recommended when factors align strongly.

What if the API can't have downtime?

Standard expectation. I use zero-downtime patterns — blue-green deploys, feature flags, strangler fig, shadow traffic — so refactor ships without outages. Cuez and bolttech both kept 99.9% uptime through their major refactor phases.

Do you use my existing stack?

Yes unless there's a compelling reason to change. Staying within your stack (Laravel, Django, Node, Rails) means the team can maintain afterwards. Stack change is recommended only when the stack itself is the bottleneck, which is rarer than teams think.

How do you measure 'measurable wins'?

P95 and P99 latency per endpoint, throughput (RPS) at target latency, infrastructure cost per transaction, error rate under load. Before/after comparison shipped weekly. Wins are real-world production numbers from APM, not staging benchmarks.

API refactor

P95 latency, cut.

Performance-first API refactor or rebuild with measurable latency wins. Proven pattern: Cuez from 3 seconds to 300ms, 10x faster, 40% less infra cost.

Available for new projects

See Custom Web Apps

Starting at $3,499/mo · monthly subscription

Who this is for

CTO with a 5+ year old monolith API struggling under load — P95 response times climbing, feature velocity falling, team debating refactor vs rebuild.

The pain today

P95 latency climbing without clear root cause
Each new feature seeming to slow the API further
Monitoring showing queries slow but unclear which to fix first
Infrastructure cost growing faster than traffic
Team split between 'incremental refactor' and 'burn it down'

The outcome you get

Baseline measurement — current P95, P99, throughput, cost
Prioritized refactor plan with expected latency and cost wins
Incremental extraction patterns — no big-bang rewrites
Measurable wins shipped within the first month
Documentation of architecture decisions for long-term team reference

Refactor vs rebuild decision framework

Seven factors decide. One: how old is the codebase? >7 years often means rebuild is easier; <5 years usually means refactor. Two: test coverage? Low coverage means refactor is dangerous and rebuild might be safer. Three: architecture clarity? Muddled boundaries favor rebuild; clear boundaries favor refactor. Four: team size and tenure? Small team with long tenure can refactor; rotating team can't. Five: business velocity pressure? High pressure means incremental (can't stop shipping for 6 months). Six: technology freshness? PHP 5.6 is effectively rebuild territory; PHP 8.2 is fine. Seven: cost of being wrong? High-stakes APIs (payments, medical) favor slow refactor over risky rebuild. I walk through each factor with your team; the answer usually becomes obvious once they're named.

Measuring before you touch code

Optimization without measurement is optimization theater. Baseline: current P50, P95, P99 latency per endpoint; current throughput (requests per second); current infrastructure cost per transaction; current error rate. Tools: APM (Datadog, New Relic, Honeycomb), database slow query log, cloud billing detail. A week of baseline data reveals the 3–5 endpoints that dominate latency and cost — usually 80/20 pattern. Fix those first. I resist the instinct to 'clean up the code' until the data says clean code matters for this specific optimization. Numbers before opinions.

Incremental extraction patterns

Strangler fig: stand up new service alongside old monolith, route specific endpoints to new service, retire old monolith endpoint-by-endpoint. Read path extraction: new service serves reads from replicated data; writes still go to monolith until read path stable. Shadow traffic: send production traffic to both old and new, compare responses, catch regressions before cutover. Feature-flag rollout: gate new service behind a flag, ramp from 1% to 10% to 50% to 100% with monitoring at each stage. These patterns make incremental work possible without big-bang risk. I've used them across Cuez, bolttech, and Imohub — the pattern is the same regardless of domain.

Case study: Cuez API (3s → 300ms)

Cuez is a SaaS for live broadcast production. Core API was 3 seconds average response time, blocking user growth because the product felt slow. I took it to 300ms — 10x faster, headline metric. Work: query optimization (most latency was in 2–3 specific endpoints), strategic Redis caching at the right layers (not everything — cache-everything hides real problems), removing work from the request path (move async where possible), database schema adjustments. Secondary outcome: ~40% infrastructure cost reduction because the API now serves 10x throughput on the same hardware. The patterns are textbook; discipline to follow them is where most teams struggle.

Pricing

API rebuild/refactor work fits the Applications Standard tier at $3,499/mo for typical monolith API work, Pro at $4,500/mo for multi-service architecture or performance-critical systems (payment, trading, real-time). First phase (measurement + plan): 2 weeks. Implementation: typically 2–4 months depending on scope and incremental delivery rhythm. 14-day money-back, cancel anytime, Work Made for Hire. All architecture decisions documented for long-term team reference — you keep knowledge, not dependency.

What I don't do

I don't 'just rewrite it in Go'. Language choice rarely moves P95 by 10x; algorithmic and architectural choices do. I don't adopt new frameworks for their own sake. I don't skip measurement to look fast. I don't promise a single latency number — optimization depends on your specific bottlenecks, and I commit to a range after the first-week audit, not before. I don't rebuild a working system because someone on the team finds the code ugly. Honest scoping based on measured data is how API optimization actually delivers wins; vanity rewrites deliver speeches.

Recent proof

A comparable engagement, delivered and documented.

API Performance Optimization

Rescued a slow API that was blocking user growth

Cuez is a live broadcast production tool used by TV teams on air across Europe. I inherited a backend API averaging 3 seconds per response and cut it to 300ms, while reducing infrastructure costs by 40% and leaving the system stable under real production load.

SaaS10x faster API40% infra savingsLive broadcast

Read the case study

Frequently asked questions

The questions prospects ask before they book.

How long before we see latency improvements?: First wins typically ship within 3–4 weeks of engagement start. The first 2 weeks are baseline measurement and root-cause analysis; week 3 onwards ships optimizations. Typical first-round wins: 30–50% P95 reduction on targeted endpoints. Larger wins (5–10x) take longer and require architectural change.
Will you refactor or rebuild?: Decided in week 1 based on the 7-factor framework. Most often: targeted refactor of 3–5 bottleneck endpoints, leaving the rest alone. Occasionally: extraction of specific services via strangler fig. Full rebuild is rare and only recommended when factors align strongly.
What if the API can't have downtime?: Standard expectation. I use zero-downtime patterns — blue-green deploys, feature flags, strangler fig, shadow traffic — so refactor ships without outages. Cuez and bolttech both kept 99.9% uptime through their major refactor phases.
Do you use my existing stack?: Yes unless there's a compelling reason to change. Staying within your stack (Laravel, Django, Node, Rails) means the team can maintain afterwards. Stack change is recommended only when the stack itself is the bottleneck, which is rarer than teams think.
How do you measure 'measurable wins'?: P95 and P99 latency per endpoint, throughput (RPS) at target latency, infrastructure cost per transaction, error rate under load. Before/after comparison shipped weekly. Wins are real-world production numbers from APM, not staging benchmarks.

Get started in 60 seconds

Ready to start?

Tell me what you need in 60 seconds. Tailored proposal in your inbox within 6 hours.

Available for new projects