Embeddings and Vector Search: Semantic Retrieval for AI Agents

Topics Covered

What Are Embeddings?

How Embeddings Are Created

Dimensions and What They Mean

Why Keywords Fail and Embeddings Succeed

Embeddings Beyond Text

Vector Similarity Search

Distance Metrics

Exact vs. Approximate Search

Common ANN Algorithms

Tuning ANN Parameters

Vector Databases

Purpose-Built vs. Extensions

Metadata Filtering

The Ingestion Pipeline

Choosing Embedding Models

Key Selection Criteria

Popular Models and Their Trade-Offs

When to Fine-Tune

Evaluating Embedding Models on Your Data

An embedding is a list of numbers that represents the meaning of a piece of text. The word "king" might become a vector of 1,536 floating-point numbers. The word "queen" becomes a different vector of 1,536 numbers, but one that is close to "king" in the mathematical space those numbers define. "Banana" becomes a vector that is far from both.

This is the core idea: text with similar meaning gets similar numbers. Text with different meaning gets different numbers. The distance between two vectors measures how related their meanings are. This turns the fuzzy concept of "similarity" into a precise mathematical operation.

Text strings converted to embedding vectors and placed in semantic space, with similar meanings clustering together

How Embeddings Are Created

An embedding model is a neural network trained on massive text corpora. During training, the model learns to place semantically related text close together and unrelated text far apart in the vector space. The model learns that "how to reset my password" and "I forgot my login credentials" mean roughly the same thing, even though they share no words.

You do not train the embedding model yourself. You use a pre-trained model provided by OpenAI (text-embedding-3), Google (Gecko), Cohere (embed-v3), or open-source options (BGE, E5, GTE). You send text to the model's API and receive a vector back. The process is fast. Typically 5-20 milliseconds per text, and cheap, roughly $0.02 per million tokens.

Dimensions and What They Mean

Each number in the vector represents a learned feature of the text, but not a human-interpretable one. You cannot look at dimension 742 and say "this represents formality" or "this captures the topic of finance." The dimensions emerge from training and encode abstract patterns that the model found useful for distinguishing meaning.

Common dimension sizes are 256, 768, 1,024, 1,536, and 3,072. More dimensions capture more nuance but cost more to store and search. A 1,536-dimensional vector takes 6 KB of storage (1,536 floats at 4 bytes each). For 10 million documents, that is 60 GB of vector storage alone, before indexing overhead.

Why Keywords Fail and Embeddings Succeed

Traditional keyword search (like SQL LIKE or Elasticsearch BM25) matches exact words. Searching for "automobile repair" misses documents about "car maintenance" because the words do not overlap. Embeddings solve this because both phrases map to nearby vectors. The model learned during training that these concepts are related.

This is semantic search: finding documents by meaning rather than by keyword match. For AI agents, semantic search is essential because users ask questions in natural language that rarely matches the exact phrasing in the knowledge base. An agent searching for "how to handle authentication errors" needs to find documents about "login failure troubleshooting" and "token expiration handling" even though the words barely overlap.

Key Insight

Embeddings capture semantic similarity, not factual accuracy. The statements 'the Earth orbits the Sun' and 'the Sun orbits the Earth' have nearly identical embeddings because they discuss the same topic with the same words. The embedding model does not evaluate truth. It measures topical relatedness. This is a fundamental limitation that affects how you interpret similarity search results.

Embeddings Beyond Text

The same principle applies to images, audio, and code. CLIP (by OpenAI) creates embeddings for both images and text in the same vector space, enabling cross-modal search. Search for images using text queries. Code embedding models like CodeBERT embed source code and natural language descriptions into the same space, enabling "find code that does X" searches. Multi-modal embeddings are increasingly common in production agent systems.