Fine-tuning is the process of further training a pre-trained foundation model on a curated dataset so it adapts its behaviour, style, or task performance to a specific domain. In finance, it is one of three main ways (alongside RAG and prompt engineering) to customise large language models for AR and O2C workflows.
Fine-tuning is the process of taking a pre-trained foundation model and continuing to train it on a smaller, curated dataset so it adapts its behaviour, style, or task performance to a specific domain. The model's weights change. After fine-tuning, the same prompt will produce a different response than it would from the base model.
It is useful to place fine-tuning next to three related but distinct techniques:
The mental model: pre-training builds the brain, fine-tuning shapes habits, RAG hands the model a reference document, and prompt engineering tells it what job to do today.
Full fine-tuning updates every parameter in the model. For a modern 70-billion-parameter model this requires significant GPU infrastructure and produces a complete new model checkpoint. Costs typically run from 10,000 to 100,000 euros plus per training run, and the resulting model must be hosted as a custom deployment.
Parameter-Efficient Fine-Tuning (PEFT) updates only a small subset of parameters, leaving the bulk of the model frozen. LoRA (Low-Rank Adaptation) is the dominant PEFT method: it inserts small trainable matrices alongside the original weights and trains only those. A LoRA adapter is often only a few megabytes and can be swapped in and out of a shared base model. Training takes hours of GPU time and typically costs 100 to 1,000 euros per run, putting it within reach of any finance IT team.
Instruction tuning and RLHF (Reinforcement Learning from Human Feedback) align a model to human preferences across many tasks. This is what turns a raw language model into a chat assistant. It is done by foundation model providers and is rarely the right tool for an individual enterprise deployment.
In AR and O2C, fine-tuning earns its place when the same narrow behaviour must be repeated thousands of times with high consistency. Useful examples include:
Notice what is missing: customer data, contract terms, current balances, dispute history. None of that should be fine-tuned in. It belongs in RAG.
The three techniques are not competitors. They solve different problems and are usually combined.
For finance specifically, the order of preference is prompt engineering, then RAG, then fine-tuning. RAG is almost always preferred over fine-tuning because customer data changes constantly, audit requires explainability through citations, and baking data into model weights creates data residency complications under GDPR.
PEFT and LoRA training runs typically cost 100 to 1,000 euros in GPU time and complete in hours. Full fine-tuning costs 10,000 to 100,000 euros plus and can take days to weeks. Inference cost is often higher for fine-tuned models: API providers commonly charge a premium per token on custom-tuned endpoints, and self-hosted custom models carry ongoing GPU hosting cost.
The pitfalls that bite enterprise teams:
Reserve fine-tuning for stable, narrow tasks: output format normalisation, classification into a fixed taxonomy, brand voice on drafted communications. Use RAG for everything involving current company-specific knowledge, which is most of what AR and O2C actually need.
Before any fine-tuning project, build an evaluation harness: a held-out set of representative inputs with expected outputs, scored by humans or a strong judge model. Without it you cannot tell whether fine-tuning helped, hurt, or did nothing.
Strip personally identifiable information and customer-specific data from training datasets. If a piece of information might need to be deleted, updated, or access-controlled later, it does not belong in weights. Treat the fine-tuned adapter as a versioned artefact: train, evaluate, deploy, monitor, and retrain on a schedule as base models evolve. For most AR teams, an AI-native platform with strong RAG and prompt engineering will deliver 90 percent of the value of fine-tuning without the infrastructure burden or audit risk.
Fine-tuning changes the model's weights by training it on examples, so the model behaves differently afterwards. RAG leaves the weights untouched and instead retrieves relevant documents at runtime to add to the prompt. Fine-tuning shapes how the model responds. RAG controls what knowledge it has access to. In finance, RAG is preferred for anything involving customer data, contracts, or balances because the data changes constantly and must be auditable.
Fine-tune when the desired behaviour must be highly consistent across thousands of calls, the task is narrow and well-defined, and the system prompt has grown so large it is hurting cost or latency. If the behaviour can be specified in under 100 lines of instruction on a capable foundation model, prompt engineering is almost always cheaper, faster to iterate on, and easier to audit.
LoRA stands for Low-Rank Adaptation. It is a Parameter-Efficient Fine-Tuning method that trains a small set of additional matrices alongside the frozen base model rather than updating all weights. A LoRA adapter is often only a few megabytes, costs 100 to 1,000 euros to train, and can be swapped in and out of a shared base model. It has made fine-tuning practical for enterprise teams that previously could not afford full fine-tuning.
PEFT and LoRA runs typically cost 100 to 1,000 euros in GPU time per training cycle and complete in hours. Full fine-tuning of a large model can cost 10,000 to 100,000 euros plus and take days to weeks. Inference is often charged at a higher per-token rate for custom-tuned models, and self-hosted custom checkpoints add ongoing GPU hosting cost.
Yes. Once data is baked into model weights it cannot be selectively deleted, which creates serious problems for GDPR data subject rights and for data residency. Any training dataset must be stripped of personally identifiable information and customer-specific records. If a piece of information might ever need to be deleted, updated, or access-controlled, it does not belong in weights. Use RAG for that data instead.
No. They solve different problems and are usually combined. Fine-tuning shapes consistent behaviour and style. RAG supplies the current customer, invoice, contract, and dispute knowledge the model needs to be useful on a specific account. An AI-native AR platform will typically rely heavily on RAG for company-specific knowledge and use fine-tuning sparingly for stable tasks like output format and brand voice.