Fine-Tuning
When to fine-tune, data preparation, LoRA, and hosted fine-tuning APIs
Overview
Fine-tuning adapts a pretrained model to a narrow task by continuing training on curated examples. It shines for consistent formatting, tone, and classification where prompting alone is unreliable or token-expensive—but it is the wrong tool for injecting fresh knowledge (use RAG for that). Modern practice favors parameter-efficient fine-tuning (PEFT) like LoRA, which trains small adapter matrices instead of all weights, cutting cost and memory dramatically.
Syntax / Usage
Hosted APIs handle infrastructure: you upload a JSONL file of chat examples and start a job. Quality and consistency of data matter more than quantity—dozens to a few hundred clean examples often outperform thousands of noisy ones.
from openai import OpenAI
client = OpenAI()
# Each line: a full conversation demonstrating desired behavior.
# training.jsonl:
# {"messages": [
# {"role": "system", "content": "Classify tickets as JSON."},
# {"role": "user", "content": "charged twice"},
# {"role": "assistant", "content": "{\"category\": \"billing\"}"}]}
file = client.files.create(file=open("training.jsonl", "rb"), purpose="fine-tune")
job = client.fine_tuning.jobs.create(
training_file=file.id,
model="gpt-4o-mini-2024-07-18",
hyperparameters={"n_epochs": 3},
)
print(job.id, job.status)
Examples
Once the job finishes, call the resulting model exactly like a base model using its returned ID:
resp = client.chat.completions.create(
model="ft:gpt-4o-mini:acme::abc123", # your fine-tuned model id
messages=[
{"role": "system", "content": "Classify tickets as JSON."},
{"role": "user", "content": "my card was declined"},
],
)
print(resp.choices[0].message.content)
For open-weight models, LoRA with the PEFT library trains lightweight adapters on your own hardware:
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"],
lora_dropout=0.05, task_type="CAUSAL_LM")
model = get_peft_model(base, config)
model.print_trainable_parameters() # only ~0.1% of params are trained
Common Mistakes
- Fine-tuning to add facts—use RAG; models don't reliably memorize from small sets
- Inconsistent labels or formatting in training data, which the model faithfully learns
- No held-out validation set, so you can't detect overfitting or regressions
- Too many epochs, overfitting to phrasing and losing generality
- Skipping a prompt-engineering baseline before committing to training cost
See Also
large-language-models prompt-engineering ai-evaluation