Introduction to Agentic AI
LLM Foundations
The Agent Paradigm
Reasoning and Planning
Agent Architectures
Safety and Reliability
Production Engineering
Real-World Agent Patterns
Embeddings and Vector Search: Semantic Retrieval for AI Agents
An embedding is a list of numbers that represents the meaning of a piece of text. The word "king" might become a vector of 1,536 floating-point numbers. The word "queen" becomes a different vector of 1,536 numbers, but one that is close to "king" in the mathematical space those numbers define. "Banana" becomes a vector that is far from both.
This is the core idea: text with similar meaning gets similar numbers. Text with different meaning gets different numbers. The distance between two vectors measures how related their meanings are. This turns the fuzzy concept of "similarity" into a precise mathematical operation.

How Embeddings Are Created
An embedding model is a neural network trained on massive text corpora. During training, the model learns to place semantically related text close together and unrelated text far apart in the vector space. The model learns that "how to reset my password" and "I forgot my login credentials" mean roughly the same thing, even though they share no words.
You do not train the embedding model yourself. You use a pre-trained model provided by OpenAI (text-embedding-3), Google (Gecko), Cohere (embed-v3), or open-source options (BGE, E5, GTE). You send text to the model's API and receive a vector back. The process is fast. Typically 5-20 milliseconds per text, and cheap, roughly $0.02 per million tokens.
Dimensions and What They Mean
Each number in the vector represents a learned feature of the text, but not a human-interpretable one. You cannot look at dimension 742 and say "this represents formality" or "this captures the topic of finance." The dimensions emerge from training and encode abstract patterns that the model found useful for distinguishing meaning.
Common dimension sizes are 256, 768, 1,024, 1,536, and 3,072. More dimensions capture more nuance but cost more to store and search. A 1,536-dimensional vector takes 6 KB of storage (1,536 floats at 4 bytes each). For 10 million documents, that is 60 GB of vector storage alone, before indexing overhead.
Why Keywords Fail and Embeddings Succeed
Traditional keyword search (like SQL LIKE or Elasticsearch BM25) matches exact words. Searching for "automobile repair" misses documents about "car maintenance" because the words do not overlap. Embeddings solve this because both phrases map to nearby vectors. The model learned during training that these concepts are related.
This is semantic search: finding documents by meaning rather than by keyword match. For AI agents, semantic search is essential because users ask questions in natural language that rarely matches the exact phrasing in the knowledge base. An agent searching for "how to handle authentication errors" needs to find documents about "login failure troubleshooting" and "token expiration handling" even though the words barely overlap.
Embeddings capture semantic similarity, not factual accuracy. The statements 'the Earth orbits the Sun' and 'the Sun orbits the Earth' have nearly identical embeddings because they discuss the same topic with the same words. The embedding model does not evaluate truth. It measures topical relatedness. This is a fundamental limitation that affects how you interpret similarity search results.
Embeddings Beyond Text
The same principle applies to images, audio, and code. CLIP (by OpenAI) creates embeddings for both images and text in the same vector space, enabling cross-modal search. Search for images using text queries. Code embedding models like CodeBERT embed source code and natural language descriptions into the same space, enabling "find code that does X" searches. Multi-modal embeddings are increasingly common in production agent systems.