Hook
Your web app works. Revenue is coming in. But your competitors just added AI-powered search, and your support team is fielding questions about when you'll "get an AI feature too."
You do not need to rebuild your application to add LLM (Large Language Model — the technology behind ChatGPT, Claude, and similar tools) capabilities. What you need is a clear integration strategy, honest cost analysis, and an architecture that keeps your existing system stable while you bolt on intelligence.
I have integrated LLM features into production applications across SaaS platforms, e-commerce systems, and internal tools over the past two years. Some took a weekend. Others took three months. The difference was not the AI model — it was how well we planned the integration with what already existed.
TL;DR Summary
- You can add LLM features to an existing web app without rebuilding it. Treat AI as a service layer, not a core rewrite.
- Three architecture patterns cover most integrations: direct API calls, middleware proxy, or async queue.
- Real API costs for mid-market applications: $200 to $3,000 per month depending on volume and model.
- A 4-phase roadmap (Audit, Prototype, Harden, Scale) reduces risk and keeps your existing app stable.
- Use third-party APIs for standard tasks. Build custom only when your data is the competitive advantage.
Need a hand with your website or web app?
Free 30-min strategy call. I'll review your situation and give you a clear next step.
Table of Contents
- What LLM Integration Actually Means (No Jargon)
- When Adding AI Makes Sense — And When It Does Not
- Three Architecture Patterns for LLM Integration
- Real API Costs: What You Will Actually Pay
- Build vs. Buy: The Decision Framework
- The 4-Phase Integration Roadmap
- Common Mistakes That Kill LLM Projects
- FAQ
- Next Steps
What LLM Integration Actually Means (No Jargon)
An LLM is a type of AI that understands and generates human language. When people say "add AI to my app," they usually mean connecting their existing web application to one of these language models through an API (Application Programming Interface — a standardized way for two software systems to communicate).
Think of it like adding a new payment processor. Your app already works. You are not rebuilding the checkout flow. You are connecting to Stripe's API so your app can process payments. LLM integration works the same way: your app sends text to an AI provider, the provider processes it, and your app receives a response.
In practice: your user types a question into your app's search bar. Your app sends it to an LLM API along with relevant context (your product docs, knowledge base, FAQ). The LLM sends back a relevant answer. That round trip takes 1-3 seconds.
Your existing database, authentication system, and frontend do not change. You are adding a new capability to an existing system, not replacing what you have.
When Adding AI Makes Sense — And When It Does Not
Before you spend a dollar on development, run your use case through these filters.
Good candidates for LLM integration
Customer-facing search and support. Traditional keyword search matches exact words. LLM-powered search understands intent — "my account is locked" matches your article titled "Password Reset Guide" even though the words do not overlap.
Content generation and summarization. Any workflow where users create or consume text benefits from LLM integration. One SaaS client's sales team spent 4 hours per day writing proposal summaries. After integration, that dropped to 45 minutes.
Data extraction from unstructured text. If your team manually pulls information from PDFs, emails, or forms, an LLM can automate 70-80% of that work. Insurance claims, invoice processing, legal document review — all strong candidates.
Internal tools and admin panels. Adding a natural language query layer ("show me all customers in Texas who haven't ordered in 90 days") saves hours compared to building custom filter interfaces.
Poor candidates for LLM integration
Anything requiring 100% accuracy. LLMs generate plausible text, not guaranteed-correct text. Medical diagnoses, legal compliance checks, financial calculations need deterministic systems. You can use an LLM to assist, but a human must verify the output.
Simple, rule-based tasks. If your logic is "if X, then Y," you do not need an LLM. A basic conditional statement costs nothing. An LLM API call costs money and adds latency.
Applications with very low text volume. If your app processes 50 requests per day and they are mostly structured data, LLM integration is overhead with no payoff.
For a broader look at where AI fits into business operations beyond web apps, see my guide on AI solutions for business.
Three Architecture Patterns for LLM Integration
Three patterns cover the vast majority of scenarios. The right choice depends on your existing stack, latency requirements, and how much control you need.
Pattern 1: Direct API Calls (Simplest)
Your backend server calls the LLM provider's API directly when a user triggers an AI feature.
Architecture: Imagine three stops in a line. Your user's browser on the left, your backend server in the middle, and the LLM API (OpenAI, Anthropic) on the right. A request flows left to right, the response flows back.
Best for: Prototypes and low-volume applications (under 1,000 AI requests per day). Fast to implement (days, not weeks). No new infrastructure. The tradeoff: every request waits 1-3 seconds for the LLM, and there is no caching or rate limiting unless you build it.
I shipped a client's internal knowledge base search using this pattern in 4 days. Users asked questions, the backend sent them to Claude's API with relevant docs as context, and the answer appeared in 2 seconds.
Pattern 2: Middleware Proxy Layer (Balanced)
You add a lightweight service between your backend and the LLM API. This proxy handles caching, rate limiting, prompt management, cost tracking, and fallback logic.
Architecture: Same three stops from Pattern 1, but with a fourth box (your AI proxy) between backend and LLM API. The proxy caches responses, enforces rate limits, manages prompts in one place, and retries or falls back to a different model on errors.
Best for: Production applications with 1,000+ daily AI requests. Caching typically cuts API calls by 30-50%. The proxy makes swapping models a configuration change instead of a code rewrite.
This is the pattern I recommend for most production integrations.
Pattern 3: Async Queue-Based (Most Robust)
AI requests go into a message queue (RabbitMQ, Amazon SQS, or Redis). A separate worker processes them in the background and stores the results.
Architecture: Two flows. The user triggers an AI feature, your backend drops a job into a queue and tells the user "processing." Separately, a background worker picks up jobs, calls the LLM API, stores results, and notifies the frontend when done.
Best for: High-volume applications (10,000+ daily requests) and batch processing. One client needed to generate descriptions for 15,000 products. Queue-based processing handled it in 3 hours with parallel workers and automatic retry. The tradeoff: more infrastructure to build and users do not get instant responses.
For more detail on how to build AI capabilities into a web application from the ground up, see my article on building AI into your web app.
Real API Costs: What You Will Actually Pay
Most blog posts dodge this with "it depends." Here are actual numbers from client projects in 2025-2026.
Cost Per Request (Approximate)
| Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Typical Request Cost |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | $0.003 - $0.02 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | $0.004 - $0.025 |
| GPT-4o Mini | $0.15 | $0.60 | $0.0003 - $0.002 |
| Claude 3.5 Haiku | $0.80 | $4.00 | $0.001 - $0.008 |
A "token" is roughly three-quarters of a word. A typical customer support question uses about 700 tokens total.
Monthly Cost Estimates by Scale
| Scale | Daily Requests | Mid-Tier Model/mo | Premium Model/mo |
|---|---|---|---|
| Small | 100-500 | $15 - $100 | $50 - $350 |
| Medium | 500-5,000 | $100 - $800 | $350 - $2,500 |
| Large | 5,000-50,000 | $800 - $5,000 | $2,500 - $15,000 |
What these numbers miss
API costs are only 20-40% of the total cost. The rest:
- Development time: $5,000 - $30,000 for initial integration, depending on pattern complexity.
- Prompt engineering: 10-20 hours of testing the instructions you send to the LLM.
- Monitoring and maintenance: 2-5 hours per month for quality checks and prompt updates.
A realistic all-in budget for a mid-market SaaS company adding one AI feature: $8,000 - $25,000 upfront, plus $300 - $2,000 per month ongoing. The biggest variable is not the AI. It is how well-organized your existing data and codebase are.
Build vs. Buy: The Decision Framework
For 95% of businesses reading this, the answer is: use the API. Here is how to decide.
Use a third-party API when:
- Your use case is general. Summarization, search, content generation, classification work well out of the box with major models.
- Speed matters more than customization. API integration ships in 2-4 weeks. Training a custom model takes 3-6 months.
- Your data volume is small to medium. Under 100,000 documents? RAG (Retrieval-Augmented Generation — feeding relevant documents to the LLM alongside the user's question so it answers based on your data) with a third-party model will outperform a custom-trained model.
Consider building or fine-tuning when:
- Your data IS the product. A proprietary dataset that makes your AI answers uniquely better is worth protecting with a fine-tuned model.
- Regulatory requirements demand it. Healthcare, defense, certain financial services — sometimes data cannot leave your infrastructure. Self-hosted open-source models (Llama 3, Mistral) solve this.
- You need cost efficiency at massive scale. At 1 million+ API calls per day, a self-hosted model can cost 60-80% less. Below that volume, operational overhead eats the savings.
The middle path: RAG with API calls
For most of my clients, the winning strategy is RAG with a third-party API. You store your company's data in a vector database (optimized for finding similar text). When a user asks a question, your app finds the relevant documents and sends them to the LLM along with the question. The LLM answers based on your specific data without you training anything. This gets you 80% of the benefit of a custom model at 10% of the cost.
The 4-Phase Integration Roadmap
This 4-phase approach has worked across a dozen production integrations. It takes 6-12 weeks for a typical mid-market application.
Phase 1: Audit (Week 1-2)
Map your existing architecture, identify the highest-value AI use case, and assess data readiness. Deliverables: architecture diagram with the proposed integration point, data quality assessment, cost estimate, and a go/no-go decision.
What kills projects here: Skipping the audit. Teams that jump straight to coding waste 2-3x the budget because they discover data problems or architectural constraints mid-build.
Phase 2: Prototype (Week 3-5)
Build a working proof of concept using Pattern 1 (direct API calls) against your real data, not sample data. Get 5-10 internal users testing it. Measure response times, accuracy, and actual API costs.
Every LLM demo looks impressive against clean examples. The test that matters: does it give useful answers when someone feeds it the messy, incomplete data your actual system contains?
Phase 3: Harden (Week 6-9)
Upgrade from prototype to production. Move to Pattern 2 if needed. Add error handling, caching, rate limiting, monitoring, and input validation.
The detail most teams miss: Input validation. Users will type anything into your AI feature, including prompt injection attacks (attempts to trick the LLM into ignoring your instructions). A hardened integration validates every input before it reaches the LLM.
Phase 4: Scale (Week 10-12)
Roll out to all users behind a feature flag. Set up analytics to measure business impact. Optimize costs by identifying which requests can use cheaper models without quality loss. Document everything for your team.
Common Mistakes That Kill LLM Projects
Here are the patterns that cause the most damage.
Starting with the model instead of the problem. "We want to add GPT-4 to our app" is not a goal. "We want to reduce support ticket resolution time by 40%" is. Start with the outcome, then work backward to the right tool.
Ignoring latency. LLM API calls take 1-5 seconds. If your users expect instant responses, you need streaming (the answer appears word-by-word) or background processing. A 4-second loading spinner is not acceptable UX.
Sending too much context per request. Founders want to "feed the AI everything." Sending your entire knowledge base in every request is expensive and slow. RAG solves this by sending only the relevant documents for each question.
Not budgeting for prompt engineering. The prompt (the instructions you give the LLM) determines 80% of output quality. I budget 10-20 hours for prompt development on every project. Clients who skip this step get answers that are technically correct but unhelpful or inconsistent.
Treating it as a one-time project. LLM providers update their models regularly. A prompt that worked in January might produce different results after a March model update. Budget 2-5 hours per month for monitoring and maintenance.
These patterns apply beyond web app features. If your broader concern is AI automation for business operations, the principles are the same whether you are automating support, document processing, or internal workflows.
FAQ
How long does it take to add LLM features to an existing web app?
Expect 6-12 weeks from audit to full production rollout for a single AI feature. A basic proof of concept can work in 1-2 weeks, but hardening for production takes the remaining time. Timeline depends on your codebase complexity and data readiness.
Do I need to rewrite my application to integrate an LLM?
No. LLM integration works through APIs — you add a new capability alongside your existing code. Your database, authentication system, and frontend stay the same. The new code is the layer that sends requests to the LLM and handles responses, typically a few hundred lines.
What does LLM integration cost for a mid-size SaaS application?
Budget $8,000 to $25,000 for initial development, plus $300 to $2,000 per month ongoing. Direct API calls are cheapest to implement; async queue-based is most expensive. Ongoing costs depend on usage volume and model choice.
Can I switch LLM providers after integration?
Yes, especially with the middleware proxy pattern. The proxy abstracts provider-specific API calls, so switching providers becomes a configuration change rather than a code rewrite.
Is my data safe when using LLM APIs?
Major providers (OpenAI, Anthropic) offer enterprise plans with SOC 2 compliance, no training on your data, and data processing agreements. If data cannot leave your infrastructure, self-hosted open-source models (Llama 3, Mistral) give you full control.
Next Steps
If you are past "should we add AI" and into "how do we do it without breaking what works," the answer starts with Phase 1: a focused audit of your existing system, your data, and the use case that delivers the most value.
I do this work with clients every month. If you want a clear assessment of where LLM integration fits into your application — and an honest answer about whether it is worth the investment — let's talk.
