Best practices to update asynchronously duplicated data in mongodb

MongoDB

Asynchronous Data

Data Synchronization

Best Practices

Database Management

Best practices to update asynchronously duplicated data in mongodb

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Duplicated data in MongoDB is often intentional. Teams duplicate fields or documents to speed up reads, reduce joins across collections, or maintain read models for different workloads. The hard part is not copying the data once, but keeping those copies correct when updates happen asynchronously and consumers may observe intermediate states.

Pick One Source of Truth

The first rule is architectural, not MongoDB-specific: one copy must be authoritative. All asynchronous update flows should begin from that source of truth rather than allowing several duplicated copies to accept independent writes.

For example:

'orders is the source of truth'
'customer_order_summaries is a derived read model'
analytics collections are downstream copies

If every copy is writable, conflict resolution becomes a business problem instead of a database problem. Most teams are better served by single-writer ownership plus asynchronous propagation.

Publish Changes Through a Reliable Stream

A common pattern is:

write the source document
emit or capture the change
update derived collections asynchronously

MongoDB change streams are useful when the deployment supports them, because they let a worker react to inserts, updates, and deletes without polling.

javascript

1const pipeline = [{ $match: { operationType: { $in: ["insert", "update", "replace"] } } }]
2const stream = db.collection("orders").watch(pipeline)
3
4stream.on("change", async change => {
5  const orderId = change.documentKey._id
6  const order = await db.collection("orders").findOne({ _id: orderId })
7
8  await db.collection("customer_order_summaries").updateOne(
9    { orderId: order._id },
10    {
11      $set: {
12        orderId: order._id,
13        customerId: order.customerId,
14        total: order.total,
15        status: order.status
16      }
17    },
18    { upsert: true }
19  )
20})

This is a good baseline, but it still needs idempotency and failure handling.

Make Downstream Updates Idempotent

Asynchronous systems retry. If a worker restarts or a consumer sees the same event twice, repeated application should not corrupt data.

Use patterns such as:

'upsert instead of blind insert'
version fields such as sourceVersion
event IDs already processed by the consumer

A simple version-aware update might look like:

javascript

1await db.collection("customer_order_summaries").updateOne(
2  {
3    orderId: order._id,
4    $or: [
5      { sourceVersion: { $lt: order.version } },
6      { sourceVersion: { $exists: false } }
7    ]
8  },
9  {
10    $set: {
11      total: order.total,
12      status: order.status,
13      sourceVersion: order.version
14    }
15  },
16  { upsert: true }
17)

Now a stale event is less likely to overwrite newer data.

Design for Eventual Consistency Explicitly

If duplication is updated asynchronously, readers must tolerate lag. That means product and engineering need a clear answer to questions like:

can the duplicate be a few seconds behind
what screens require fresh reads from the source collection
what happens if propagation is delayed

The worst design is pretending the duplicates are always current when they are not. Make consistency boundaries visible in the system design.

Keep Repair and Replay Paths

Every asynchronous duplication system eventually needs repair tooling. Examples:

backfill a derived collection from source data
replay missed events
compare source and duplicate counts or hashes

Without repair tooling, a transient worker failure can create long-lived silent drift between the authoritative collection and its copies.

Common Pitfalls

Letting several duplicated copies accept writes instead of preserving one source of truth.
Updating downstream documents asynchronously without idempotent upsert or version checks.
Assuming change streams alone guarantee correctness without retry and replay logic.
Treating eventual consistency as invisible and surprising readers with stale duplicates.
Building propagation workers without a repair or backfill strategy for missed updates.

Summary

Duplicated MongoDB data is manageable when one collection remains authoritative.
Use change streams or another reliable event source to propagate updates asynchronously.
Make downstream consumers idempotent with upsert, versioning, or processed-event tracking.
Accept eventual consistency explicitly and route fresh-read cases to the source collection.
Keep replay and reconciliation tools so duplicates can be repaired after failures.