MongoDB Aggregation Pipelines – Step-by-Step Explanation

The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.

Think of it as a conveyor belt where documents flow through a series of stages, and each stage transforms the data before passing it to the next.

Why Use Aggregation Pipelines?

Use Case	Example
Data summarization	Count users by country
Reporting	Monthly sales totals
Data cleaning	Remove duplicates, normalize fields
Real-time analytics	Top 10 products by revenue
ETL (Extract-Transform-Load)	Prepare data for dashboards

Core Concept: The Pipeline

db.collection.aggregate([
  { $stage1 },
  { $stage2 },
  { $stage3 },
  ...
])

Each { $stage } is a stage in the pipeline.
Documents flow from left to right.
Output of one stage becomes input to the next.
Final result is returned as an array of documents.

Key Aggregation Stages

Here are the most commonly used stages:

Stage	Purpose	Syntax
`$match`	Filter documents	Like `find()`
`$project`	Select or reshape fields	Include, exclude, compute
`$group`	Group by field(s), aggregate	Like SQL `GROUP BY`
`$sort`	Sort results	`1` = asc, `-1` = desc
`$limit` / `$skip`	Pagination	Limit number of results
`$unwind`	Deconstruct arrays	Turn array elements into separate docs
`$lookup`	Join with another collection	Like SQL `JOIN`
`$addFields`	Add new fields	Compute values
`$count`	Count documents	Returns total
`$out` / `$merge`	Write results to new collection	Export pipeline output

Step-by-Step Examples

Let’s use a sample collection: orders

{
  _id: 1,
  customer: "Alice",
  items: ["laptop", "mouse"],
  total: 1200,
  status: "completed",
  country: "USA"
},
{
  _id: 2,
  customer: "Bob",
  items: ["phone"],
  total: 800,
  status: "pending",
  country: "Canada"
}

1. `$match` – Filter Documents

{ $match: { status: "completed" } }

Only completed orders pass through.

2. `$project` – Reshape Output

{
  $project: {
    customer: 1,
    total: 1,
    itemCount: { $size: "$items" }
  }
}

Result:

{ customer: "Alice", total: 1200, itemCount: 2 }

3. `$group` – Group & Aggregate

Goal: Total sales per country

{
  $group: {
    _id: "$country",           // Group by country
    totalSales: { $sum: "$total" },
    orderCount: { $sum: 1 },
    avgOrder: { $avg: "$total" }
  }
}

Result:

{ _id: "USA", totalSales: 1200, orderCount: 1, avgOrder: 1200 }
{ _id: "Canada", totalSales: 800, orderCount: 1, avgOrder: 800 }

Accumulators in $group:

$sum, $avg, $min, $max, $push, $addToSet, $first, $last

4. `$sort` – Order Results

{ $sort: { totalSales: -1 } }

Sort by total sales descending.

5. `$limit` – Top N Results

{ $limit: 3 }

Only top 3 countries.

6. `$unwind` – Expand Arrays

{ $unwind: "$items" }

Turns:

{ items: ["laptop", "mouse"] }

Into two documents:

{ items: "laptop" }, { items: "mouse" }

Useful for analyzing individual array elements.

7. `$lookup` – Join Collections

Assume another collection: customers

{
  $lookup: {
    from: "customers",
    localField: "customer",
    foreignField: "name",
    as: "customerInfo"
  }
}

Adds customer details (e.g., email, phone) from another collection.

8. `$addFields` – Compute New Fields

{
  $addFields: {
    tax: { $multiply: ["$total", 0.08] },
    totalWithTax: { $add: ["$total", { $multiply: ["$total", 0.08] }] }
  }
}

Full Example: Top Customers by Spending

db.orders.aggregate([
  { $match: { status: "completed" } },

  { $group: {
      _id: "$customer",
      totalSpent: { $sum: "$total" },
      orders: { $sum: 1 }
  }},

  { $sort: { totalSpent: -1 } },

  { $limit: 5 },

  { $project: {
      customer: "$_id",
      totalSpent: 1,
      orders: 1,
      _id: 0
  }}
])

Output:

{ customer: "Alice", totalSpent: 1200, orders: 1 }

Advanced: Conditional Logic

Use $cond, $switch, $ifNull

{
  $addFields: {
    statusLabel: {
      $switch: {
        branches: [
          { case: { $eq: ["$status", "completed"] }, then: "Done" },
          { case: { $eq: ["$status", "pending"] }, then: "Waiting" }
        ],
        default: "Unknown"
      }
    }
  }
}

Performance Tips

Put $match early – filter as soon as possible.
Use indexes on fields in $match and $sort.
Project only needed fields early to reduce memory.
Avoid large $unwind on big arrays.
Use allowDiskUse: true for large datasets:

db.collection.aggregate(pipeline, { allowDiskUse: true })

Tools to Visualize Pipelines

MongoDB Compass – Drag-and-drop pipeline builder
MongoDB Atlas – Visual pipeline editor
mongosh – Test in shell

Summary: Pipeline Flow

[Raw Docs]
     ↓
[$match] → filter
     ↓
[$project/$addFields] → reshape
     ↓
[$unwind] → expand arrays
     ↓
[$group] → aggregate
     ↓
[$sort] → order
     ↓
[$limit] → paginate
     ↓
[Result]

Official Docs & Resources

Practice Tip: Start with small datasets. Build pipelines step by step in mongosh, adding one stage at a time and checking output with .pretty().

Let me know if you want a real-world example (e.g., e-commerce dashboard, log analysis, user analytics)!

MongoDB Aggregation Pipelines – Step-by-Step Explanation

MongoDB Aggregation Pipelines – Step-by-Step Explanation

Why Use Aggregation Pipelines?

Core Concept: The Pipeline

Key Aggregation Stages

Step-by-Step Examples

1. $match – Filter Documents

2. $project – Reshape Output

3. $group – Group & Aggregate

4. $sort – Order Results

5. $limit – Top N Results

6. $unwind – Expand Arrays

7. $lookup – Join Collections

8. $addFields – Compute New Fields

Full Example: Top Customers by Spending

Advanced: Conditional Logic

Performance Tips

Tools to Visualize Pipelines

Summary: Pipeline Flow

Official Docs & Resources

1. `$match` – Filter Documents

2. `$project` – Reshape Output

3. `$group` – Group & Aggregate

4. `$sort` – Order Results

5. `$limit` – Top N Results

6. `$unwind` – Expand Arrays

7. `$lookup` – Join Collections

8. `$addFields` – Compute New Fields