Dualo
AI in Practice

Prompting vs RAG vs Fine-tuning — how to choose

Which technique fits your problem? A decision framework by use case, with real-world trade-offs (cost, complexity, freshness, control).

1 min read

Three ways to adapt an to your use case: **prompting** (well-written instructions), **** (give it your documents), **fine-tuning** (retrain the model with your data). 90% of projects only need the first two. Fine-tuning is more often a decoy than a solution.

Prompting is the starting point. Works when: the task is well-defined in plain language, generic knowledge + some context suffices, the right format can be expressed in instructions + 2-5 examples. Cost: minimal (just LLM calls). Complexity: days. Freshness: live. Control: medium.

**RAG** kicks in when the LLM doesn't have your specific info — internal docs, tickets, contracts. Cost: infra ( + storage) + LLM calls. Complexity: weeks. Freshness: minutes (you re-index when docs change). Control: high (you see the retrieved chunks).

**** = retrain a small/medium model on thousands of (input, output) pairs. Changes style/format the model always follows, or adds specialized vocabulary. Cost: training ($200-10k per run) + hosting (you pay to serve 24/7). Complexity: months. Freshness: frozen at training time (need to retrain to update). Control: very high on style, low on new facts.

**The decision tree**: (1) Can you reach 85% quality with prompt + ? Stop, ship. (2) Is the gap in facts? RAG. (3) Is the gap in format/style that 50+ examples don't fix? Fine-tune. (4) Is volume + latency + cost critical? Fine-tune a smaller model to replace frontier at scale. Most projects stop at step 1 or 2.

Common mistake: jumping straight to fine-tuning 'to get better'. Usually a well-crafted prompt + a good RAG beats a badly-fine-tuned model. Fine-tuning is a SCALE optimization (reduce token costs at high volume), not an INTELLIGENCE upgrade.

Grounded on https://www.anthropic.com/research

Next up

AI ROI & Metrics — how to measure real value

Most AI projects fail on measurement, not tech. Here's what to track to know if it's actually working and to justify scaling the budget.