MongoDB: When to Embed, When to Reference

March 8, 2026


The single biggest MongoDB design decision is whether to embed a related object inside its parent document or to keep it in a separate collection and reference it by id. Get this right and most queries are one round trip. Get it wrong and you end up reimplementing joins in application code, which is the worst of both worlds.

The rule I use is short. Embed when the child is read together with the parent, the relationship is one-to-few, and the nested data stays bounded. A product with its size and color variants is the canonical case. One read returns the whole thing. There is no $lookup. Updates to the parent and the embedded children are atomic at the document level, which is the only transactional guarantee you reliably get in a document store without paying for multi-document transactions.

Reference when the child is shared across many parents, the relationship grows large, or the child can grow without an upper bound. Users referenced by orders, tags applied to posts, comments on a viral video. If you embed a comment thread on a YouTube-style document, the first time a video goes viral you will hit the 16MB document ceiling and your writes will start failing in production with a BSONObjectTooLarge error. That ceiling is not a soft limit you can tune. It is enforced by the server, and once you hit it the document is unusable until you migrate the structure.

There is a second failure mode that is less obvious. Embedding turns every child update into a parent update, which means every secondary index on the parent gets touched on every child write. If your product document has 50 variants and an index on category, updating one variant's stock count rewrites the index entry. Under heavy write load this becomes a hot document, and MongoDB serializes writes on a single document. You get tail latency that looks like a lock contention problem because it basically is.

The rule of thumb fits on one line. If it is bounded and read together, embed it. If it is shared or unbounded, reference it. The most common mistake is treating MongoDB like Postgres, splitting everything into collections, and joining with $lookup on every query. That gives you the complexity of joins without the document model's main benefit, which is locality. The other common mistake is the opposite: embedding everything because the docs say documents are flexible, then discovering at scale that one runaway child array has turned a tidy product catalog into a write-amplified mess. Model around how the application actually reads, not around what feels normalized.

Key takeaway

Embed when the child is bounded and read with the parent. Reference when the child is shared or grows without limit. The 16MB cap and write hotspots make the call obvious.

Originally posted on LinkedIn. View original.


All Rights Reserved.