MongoDB
Asynchronous Data
Data Synchronization
Best Practices
Database Management

Best practices to update asynchronously duplicated data in mongodb

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Duplicated data in MongoDB is often intentional. Teams duplicate fields or documents to speed up reads, reduce joins across collections, or maintain read models for different workloads. The hard part is not copying the data once, but keeping those copies correct when updates happen asynchronously and consumers may observe intermediate states.

Pick One Source of Truth

The first rule is architectural, not MongoDB-specific: one copy must be authoritative. All asynchronous update flows should begin from that source of truth rather than allowing several duplicated copies to accept independent writes.

For example:

  • 'orders is the source of truth'
  • 'customer_order_summaries is a derived read model'
  • analytics collections are downstream copies

If every copy is writable, conflict resolution becomes a business problem instead of a database problem. Most teams are better served by single-writer ownership plus asynchronous propagation.

Publish Changes Through a Reliable Stream

A common pattern is:

  1. write the source document
  2. emit or capture the change
  3. update derived collections asynchronously

MongoDB change streams are useful when the deployment supports them, because they let a worker react to inserts, updates, and deletes without polling.

javascript
1const pipeline = [{ $match: { operationType: { $in: ["insert", "update", "replace"] } } }]
2const stream = db.collection("orders").watch(pipeline)
3
4stream.on("change", async change => {
5  const orderId = change.documentKey._id
6  const order = await db.collection("orders").findOne({ _id: orderId })
7
8  await db.collection("customer_order_summaries").updateOne(
9    { orderId: order._id },
10    {
11      $set: {
12        orderId: order._id,
13        customerId: order.customerId,
14        total: order.total,
15        status: order.status
16      }
17    },
18    { upsert: true }
19  )
20})

This is a good baseline, but it still needs idempotency and failure handling.

Make Downstream Updates Idempotent

Asynchronous systems retry. If a worker restarts or a consumer sees the same event twice, repeated application should not corrupt data.

Use patterns such as:

  • 'upsert instead of blind insert'
  • version fields such as sourceVersion
  • event IDs already processed by the consumer

A simple version-aware update might look like:

javascript
1await db.collection("customer_order_summaries").updateOne(
2  {
3    orderId: order._id,
4    $or: [
5      { sourceVersion: { $lt: order.version } },
6      { sourceVersion: { $exists: false } }
7    ]
8  },
9  {
10    $set: {
11      total: order.total,
12      status: order.status,
13      sourceVersion: order.version
14    }
15  },
16  { upsert: true }
17)

Now a stale event is less likely to overwrite newer data.

Design for Eventual Consistency Explicitly

If duplication is updated asynchronously, readers must tolerate lag. That means product and engineering need a clear answer to questions like:

  • can the duplicate be a few seconds behind
  • what screens require fresh reads from the source collection
  • what happens if propagation is delayed

The worst design is pretending the duplicates are always current when they are not. Make consistency boundaries visible in the system design.

Keep Repair and Replay Paths

Every asynchronous duplication system eventually needs repair tooling. Examples:

  • backfill a derived collection from source data
  • replay missed events
  • compare source and duplicate counts or hashes

Without repair tooling, a transient worker failure can create long-lived silent drift between the authoritative collection and its copies.

Common Pitfalls

  • Letting several duplicated copies accept writes instead of preserving one source of truth.
  • Updating downstream documents asynchronously without idempotent upsert or version checks.
  • Assuming change streams alone guarantee correctness without retry and replay logic.
  • Treating eventual consistency as invisible and surprising readers with stale duplicates.
  • Building propagation workers without a repair or backfill strategy for missed updates.

Summary

  • Duplicated MongoDB data is manageable when one collection remains authoritative.
  • Use change streams or another reliable event source to propagate updates asynchronously.
  • Make downstream consumers idempotent with upsert, versioning, or processed-event tracking.
  • Accept eventual consistency explicitly and route fresh-read cases to the source collection.
  • Keep replay and reconciliation tools so duplicates can be repaired after failures.

Course illustration
Course illustration

All Rights Reserved.