Embeddings
Vector representations, similarity search, and practical use cases
Overview
An embedding is a dense numerical vector that represents meaning. Similar concepts land close together in vector space—"king" and "queen" are nearer than "king" and "car". Models like OpenAI text-embedding-3-small, Cohere embed, or open-source sentence-transformers convert text (words, sentences, documents) into fixed-length arrays of floats.
Embeddings power semantic search, recommendations, clustering, and RAG retrieval. They compress language into coordinates you can compare with math instead of keyword matching.
Syntax / Usage
Core operations:
1. Embed query and documents → vectors (e.g. 1536 dimensions)
2. Compare with cosine similarity or dot product
3. Return top-k nearest neighbors
Cosine similarity (values −1 to 1, higher = more similar):
import math
def cosine_similarity(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = math.sqrt(sum(x * x for x in a))
norm_b = math.sqrt(sum(x * x for x in b))
return dot / (norm_a * norm_b)
API call pattern:
const response = await fetch("https://api.openai.com/v1/embeddings", {
method: "POST",
headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
body: JSON.stringify({
model: "text-embedding-3-small",
input: ["How do I reset my password?", "Password reset steps for SSO"],
}),
});
const { data } = await response.json();
const vectors = data.map((d: { embedding: number[] }) => d.embedding);
Store vectors in pgvector (Supabase/Postgres), Pinecone, Weaviate, or Qdrant for scalable search.
Examples
Semantic FAQ lookup:
User query: "can't log in after changing email"
Top match: "Updating account email breaks SSO session" (score 0.89)
Weak match: "Billing FAQ" (score 0.31)
Chunking long docs before embedding (500–1000 tokens per chunk with overlap) improves retrieval precision. Include metadata (source URL, title) with each vector for citations in RAG.
Common Mistakes
- Embedding entire books as one vector—queries match poorly; chunk instead
- Mixing embedding models in one index (dimensions and geometry differ)
- Using Euclidean distance without normalizing when cosine is appropriate
- Re-embedding the corpus on every query—cache document embeddings
- Assuming high similarity always means factual equivalence
See Also
rag-basics large-language-models ai-apis machine-learning-basics