Loading...
Development

MongoDB Aggregation Pipelines – Step-by-Step Explanation

MongoDB Aggregation Pipelines – Step-by-Step Explanation

The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.

Think of it as a conveyor belt where documents flow through a series of stages, and each stage transforms the data before passing it to the next.


Why Use Aggregation Pipelines?

Use CaseExample
Data summarizationCount users by country
ReportingMonthly sales totals
Data cleaningRemove duplicates, normalize fields
Real-time analyticsTop 10 products by revenue
ETL (Extract-Transform-Load)Prepare data for dashboards

Core Concept: The Pipeline

db.collection.aggregate([
  { $stage1 },
  { $stage2 },
  { $stage3 },
  ...
])
  • Each { $stage } is a stage in the pipeline.
  • Documents flow from left to right.
  • Output of one stage becomes input to the next.
  • Final result is returned as an array of documents.

Key Aggregation Stages

Here are the most commonly used stages:

StagePurposeSyntax
$matchFilter documentsLike find()
$projectSelect or reshape fieldsInclude, exclude, compute
$groupGroup by field(s), aggregateLike SQL GROUP BY
$sortSort results1 = asc, -1 = desc
$limit / $skipPaginationLimit number of results
$unwindDeconstruct arraysTurn array elements into separate docs
$lookupJoin with another collectionLike SQL JOIN
$addFieldsAdd new fieldsCompute values
$countCount documentsReturns total
$out / $mergeWrite results to new collectionExport pipeline output

Step-by-Step Examples

Let’s use a sample collection: orders

{
  _id: 1,
  customer: "Alice",
  items: ["laptop", "mouse"],
  total: 1200,
  status: "completed",
  country: "USA"
},
{
  _id: 2,
  customer: "Bob",
  items: ["phone"],
  total: 800,
  status: "pending",
  country: "Canada"
}

1. $match – Filter Documents

{ $match: { status: "completed" } }

Only completed orders pass through.


2. $project – Reshape Output

{
  $project: {
    customer: 1,
    total: 1,
    itemCount: { $size: "$items" }
  }
}

Result:

{ customer: "Alice", total: 1200, itemCount: 2 }

3. $group – Group & Aggregate

Goal: Total sales per country

{
  $group: {
    _id: "$country",           // Group by country
    totalSales: { $sum: "$total" },
    orderCount: { $sum: 1 },
    avgOrder: { $avg: "$total" }
  }
}

Result:

{ _id: "USA", totalSales: 1200, orderCount: 1, avgOrder: 1200 }
{ _id: "Canada", totalSales: 800, orderCount: 1, avgOrder: 800 }

Accumulators in $group:

  • $sum, $avg, $min, $max, $push, $addToSet, $first, $last

4. $sort – Order Results

{ $sort: { totalSales: -1 } }

Sort by total sales descending.


5. $limit – Top N Results

{ $limit: 3 }

Only top 3 countries.


6. $unwind – Expand Arrays

{ $unwind: "$items" }

Turns:

{ items: ["laptop", "mouse"] }

Into two documents:

{ items: "laptop" }, { items: "mouse" }

Useful for analyzing individual array elements.


7. $lookup – Join Collections

Assume another collection: customers

{
  $lookup: {
    from: "customers",
    localField: "customer",
    foreignField: "name",
    as: "customerInfo"
  }
}

Adds customer details (e.g., email, phone) from another collection.


8. $addFields – Compute New Fields

{
  $addFields: {
    tax: { $multiply: ["$total", 0.08] },
    totalWithTax: { $add: ["$total", { $multiply: ["$total", 0.08] }] }
  }
}

Full Example: Top Customers by Spending

db.orders.aggregate([
  { $match: { status: "completed" } },

  { $group: {
      _id: "$customer",
      totalSpent: { $sum: "$total" },
      orders: { $sum: 1 }
  }},

  { $sort: { totalSpent: -1 } },

  { $limit: 5 },

  { $project: {
      customer: "$_id",
      totalSpent: 1,
      orders: 1,
      _id: 0
  }}
])

Output:

{ customer: "Alice", totalSpent: 1200, orders: 1 }

Advanced: Conditional Logic

Use $cond, $switch, $ifNull

{
  $addFields: {
    statusLabel: {
      $switch: {
        branches: [
          { case: { $eq: ["$status", "completed"] }, then: "Done" },
          { case: { $eq: ["$status", "pending"] }, then: "Waiting" }
        ],
        default: "Unknown"
      }
    }
  }
}

Performance Tips

  1. Put $match early – filter as soon as possible.
  2. Use indexes on fields in $match and $sort.
  3. Project only needed fields early to reduce memory.
  4. Avoid large $unwind on big arrays.
  5. Use allowDiskUse: true for large datasets:
db.collection.aggregate(pipeline, { allowDiskUse: true })

Tools to Visualize Pipelines

  • MongoDB Compass – Drag-and-drop pipeline builder
  • MongoDB Atlas – Visual pipeline editor
  • mongosh – Test in shell

Summary: Pipeline Flow

[Raw Docs]
     ↓
[$match] → filter
     ↓
[$project/$addFields] → reshape
     ↓
[$unwind] → expand arrays
     ↓
[$group] → aggregate
     ↓
[$sort] → order
     ↓
[$limit] → paginate
     ↓
[Result]

Official Docs & Resources


Practice Tip: Start with small datasets. Build pipelines step by step in mongosh, adding one stage at a time and checking output with .pretty().

Let me know if you want a real-world example (e.g., e-commerce dashboard, log analysis, user analytics)!