Designing Scalable MongoDB Schemas: Patterns You Should Know

Document vs Relational Thinking

MongoDB's document model encourages embedding related data, which reduces joins and improves read performance. However, embedding everything leads to document bloat and write amplification. The key is knowing when to embed and when to reference.

The Embedded Document Pattern

Embed when: the embedded data is only accessed via the parent, the cardinality is one-to-few, and the data doesn't need independent querying.

// Embed address inside user — rarely queried independently
{
  name: "Alice",
  address: { street: "123 Main", city: "NYC" }
}

The Reference Pattern

Reference when: data is accessed independently, the cardinality is one-to-many, or multiple documents reference the same entity. This is what we use for posts → categories and posts → author.

Indexing Strategy

Every query should hit an index. Run db.collection.explain("executionStats") to verify. Key indexes for a blog:

Unique index on slug
Compound index on { status: 1, publishedAt: -1 } for listing published posts
Text index on { title: "text", content: "text" } for search

The Bucket Pattern

For time-series data (like view counts per hour), bucket documents instead of one document per event. This reduces document count dramatically and improves aggregation performance.

Schema Versioning

Add a schemaVersion field to every document. When you need to migrate, you can do it lazily on read — transform the old shape to the new one, save it back, and never need a big-bang migration.