What's the typical engineering cost of each option?

Prompting: two to five days. RAG: two to four weeks for a real implementation. Fine-tuning: four to twelve weeks including data, training, and evaluation.

Why is evaluation the unskippable piece?

Without an evaluation harness, you cannot tell whether a change made the system better or worse. Progress becomes vibes, which is how AI products quietly degrade after launch.

When to use RAG, when to fine-tune, and when to just use a good prompt

Every week a founder asks us whether their product should use RAG, fine-tune a model, or "just prompt." The honest answer is that the three solve different problems and picking wrong is how six months of budget disappears into an evaluation loop that never converges.

Here is the decision tree we use internally, and the rough cost of being wrong on each branch.

Start with: what does your product need the model to do?

Follow an instruction, apply judgement, generate prose. Prompt engineering. The model already knows the things you need it to know; you're renting its general capability. This is the cheapest, fastest, and most of the time, the right answer.
Answer questions grounded in your private data. RAG. You are not teaching the model new skills. You are teaching it which documents to look at before it answers. If your product's value is "this assistant knows our product," RAG is almost always right.
Behave in a specific style, format, or domain voice that prompting cannot reliably produce. Fine-tune. This is the most expensive option and almost never the first one you should reach for.

The rule: start with prompting, add RAG when the product needs grounded facts, reach for fine-tuning only when the first two have been exhausted and you have measurable evidence they weren't enough.

The rough cost of each option

Prompting. Engineering cost: two to five days to get to a good baseline. Runtime cost: pay-per-token on the frontier model. Iteration cost: minutes. You are buying speed and flexibility.
RAG. Engineering cost: two to four weeks for a real implementation (chunking, embedding, retrieval, reranking, evaluation). Runtime cost: vector DB plus token cost; typically 1.5x to 3x prompting. Iteration cost: hours to days to improve retrieval quality. You are buying groundedness and scope control.
Fine-tuning. Engineering cost: four to twelve weeks including data collection, labelling, training runs, evaluation, and guardrails. Runtime cost: variable, often lower per token, but higher fixed cost. Iteration cost: days to weeks per iteration. You are buying a specific behaviour you cannot prompt for.

The most common mistake we see

Founders reach for fine-tuning because it feels like the "real AI" move. The majority of the time what they actually need is better prompting plus RAG over their support docs. We have talked four prospective clients out of fine-tuning engagements in the last year, and in each case the cheaper solution landed inside the budget and timeline the fine-tune would have blown.

The opposite mistake is rarer but expensive: founders with genuine domain-voice problems who try to prompt their way through them. Legal drafting, medical triage, highly structured report generation. Prompting produces 85% acceptable output and 15% dangerous output, which is exactly the distribution that eats trust.

Evaluation is the unskippable piece

Whichever path you pick, you need an evaluation harness before you need the feature. Otherwise you cannot tell if a change made the product better or worse, and progress becomes vibes.

For LaunchProd we ran RAG with a retrieval evaluation that scored relevance on every change. The harness saved us from two "obvious improvements" that would have quietly regressed quality.

Heuristics

Prompt first. Add RAG when the product needs your data. Fine-tune only after you've proven the first two aren't enough.
Evaluation comes before the feature, not after. Without a harness, you cannot ship AI responsibly.
The most expensive AI system is the one that solves a prompting problem with a fine-tune. Cheaper does not mean worse.

Written 2025-06-21 by Naman Barkiya.

When to use RAG, when to fine-tune, and when to just use a good prompt.

Start with: what does your product need the model to do?

The rough cost of each option

The most common mistake we see

Evaluation is the unskippable piece

Heuristics

Questions this usually surfaces.