Best practices to update asynchronously duplicated data in mongodb
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Duplicated data in MongoDB is often intentional. Teams duplicate fields or documents to speed up reads, reduce joins across collections, or maintain read models for different workloads. The hard part is not copying the data once, but keeping those copies correct when updates happen asynchronously and consumers may observe intermediate states.
Pick One Source of Truth
The first rule is architectural, not MongoDB-specific: one copy must be authoritative. All asynchronous update flows should begin from that source of truth rather than allowing several duplicated copies to accept independent writes.
For example:
- '
ordersis the source of truth' - '
customer_order_summariesis a derived read model' - analytics collections are downstream copies
If every copy is writable, conflict resolution becomes a business problem instead of a database problem. Most teams are better served by single-writer ownership plus asynchronous propagation.
Publish Changes Through a Reliable Stream
A common pattern is:
- write the source document
- emit or capture the change
- update derived collections asynchronously
MongoDB change streams are useful when the deployment supports them, because they let a worker react to inserts, updates, and deletes without polling.
This is a good baseline, but it still needs idempotency and failure handling.
Make Downstream Updates Idempotent
Asynchronous systems retry. If a worker restarts or a consumer sees the same event twice, repeated application should not corrupt data.
Use patterns such as:
- '
upsertinstead of blind insert' - version fields such as
sourceVersion - event IDs already processed by the consumer
A simple version-aware update might look like:
Now a stale event is less likely to overwrite newer data.
Design for Eventual Consistency Explicitly
If duplication is updated asynchronously, readers must tolerate lag. That means product and engineering need a clear answer to questions like:
- can the duplicate be a few seconds behind
- what screens require fresh reads from the source collection
- what happens if propagation is delayed
The worst design is pretending the duplicates are always current when they are not. Make consistency boundaries visible in the system design.
Keep Repair and Replay Paths
Every asynchronous duplication system eventually needs repair tooling. Examples:
- backfill a derived collection from source data
- replay missed events
- compare source and duplicate counts or hashes
Without repair tooling, a transient worker failure can create long-lived silent drift between the authoritative collection and its copies.
Common Pitfalls
- Letting several duplicated copies accept writes instead of preserving one source of truth.
- Updating downstream documents asynchronously without idempotent
upsertor version checks. - Assuming change streams alone guarantee correctness without retry and replay logic.
- Treating eventual consistency as invisible and surprising readers with stale duplicates.
- Building propagation workers without a repair or backfill strategy for missed updates.
Summary
- Duplicated MongoDB data is manageable when one collection remains authoritative.
- Use change streams or another reliable event source to propagate updates asynchronously.
- Make downstream consumers idempotent with
upsert, versioning, or processed-event tracking. - Accept eventual consistency explicitly and route fresh-read cases to the source collection.
- Keep replay and reconciliation tools so duplicates can be repaired after failures.

