stackademic

The leading education platform for anyone with an interest in software development.

Fine-Tuning

When to fine-tune, data preparation, LoRA, and hosted fine-tuning APIs

Overview

Fine-tuning adapts a pretrained model to a narrow task by continuing training on curated examples. It shines for consistent formatting, tone, and classification where prompting alone is unreliable or token-expensive—but it is the wrong tool for injecting fresh knowledge (use RAG for that). Modern practice favors parameter-efficient fine-tuning (PEFT) like LoRA, which trains small adapter matrices instead of all weights, cutting cost and memory dramatically.

Syntax / Usage

Hosted APIs handle infrastructure: you upload a JSONL file of chat examples and start a job. Quality and consistency of data matter more than quantity—dozens to a few hundred clean examples often outperform thousands of noisy ones.

from openai import OpenAI

client = OpenAI()

# Each line: a full conversation demonstrating desired behavior.
# training.jsonl:
# {"messages": [
#   {"role": "system", "content": "Classify tickets as JSON."},
#   {"role": "user", "content": "charged twice"},
#   {"role": "assistant", "content": "{\"category\": \"billing\"}"}]}

file = client.files.create(file=open("training.jsonl", "rb"), purpose="fine-tune")

job = client.fine_tuning.jobs.create(
    training_file=file.id,
    model="gpt-4o-mini-2024-07-18",
    hyperparameters={"n_epochs": 3},
)
print(job.id, job.status)

Examples

Once the job finishes, call the resulting model exactly like a base model using its returned ID:

resp = client.chat.completions.create(
    model="ft:gpt-4o-mini:acme::abc123",  # your fine-tuned model id
    messages=[
        {"role": "system", "content": "Classify tickets as JSON."},
        {"role": "user", "content": "my card was declined"},
    ],
)
print(resp.choices[0].message.content)

For open-weight models, LoRA with the PEFT library trains lightweight adapters on your own hardware:

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"],
                    lora_dropout=0.05, task_type="CAUSAL_LM")
model = get_peft_model(base, config)
model.print_trainable_parameters()  # only ~0.1% of params are trained

Common Mistakes

  • Fine-tuning to add facts—use RAG; models don't reliably memorize from small sets
  • Inconsistent labels or formatting in training data, which the model faithfully learns
  • No held-out validation set, so you can't detect overfitting or regressions
  • Too many epochs, overfitting to phrasing and losing generality
  • Skipping a prompt-engineering baseline before committing to training cost

See Also

large-language-models prompt-engineering ai-evaluation