Why does my MongoDB sharded cluster have only 10 chunks after importing 716 million documents?

Question

I have a MongoDB (v8.0.9) sharded cluster running in Kubernetes with the following setup:

3 shards, each with 2 replicas
An empty collection was created with a hashed shard key (over a UUID field)

Sharding was enabled using:

sh.enableSharding("db")
sh.shardCollection("db.collection", { shardkey: "hashed" })

The balancer was started:
```
sh.startBalancer()
```
I started importing data at a rate of about 2,500 documents per second
I stopped the import after reaching approximately 716 million documents

Data distribution is perfect (about 33.3% of data on each shard), but the problem is that there are only 10 chunks in total. Here is the output from the chunk distribution (after 2 days after data import):

Shard-0
{
  data: '884.5GiB',
  docs: 238,840,277,
  chunks: 1,
  'estimated data per chunk': '884.5GiB',
  'estimated docs per chunk': 238,840,277
}
Shard-1
{
  data: '884.4GiB',
  docs: 238,813,353,
  chunks: 4,
  'estimated data per chunk': '221.1GiB',
  'estimated docs per chunk': 59,703,338
}
Shard-2
{
  data: '884.54GiB',
  docs: 238,851,801,
  chunks: 5,
  'estimated data per chunk': '176.9GiB',
  'estimated docs per chunk': 47,770,360
}

My question:
Why are there only 10 chunks in total, even though the collection contains over 700 million documents and nearly 900 GiB of data per shard?

Additional info:

The balancer is running.
The collection was empty before import.
The shard key is a hashed UUID field.
chunksize: 128

What could be causing the low number of chunks? Shouldn’t there be many more chunks given the data volume?

What I tried:

I attempted to manually split the chunks using the sh.splitFind() method.
After executing the splits, the chunks were indeed split as expected.
However, after a few minutes, the chunks reverted back to their previous state (i.e., the splits were undone).

Update (2025-05-27):

I continued with testing and added 3 new shards to the cluster (so now there are 6 shards in total). The balancer was enabled and running. However, I noticed that after adding the new shards, the data was not automatically redistributed to the new shards—everything remained on the original 3 shards, and the chunk distribution did not change.

My questions:

What is the best approach to trigger data rebalancing after adding new shards to an existing cluster?
Is it effective for the balancer to work with such large chunks (e.g., ~884 GB per chunk), or should I try to split them into smaller chunks first?
Is there a recommended way to force or accelerate the redistribution of data to the new shards in this scenario?

Thank you for any insights!

MongoDB 8 spreads the data across the shards so that each has roughly the same size. What would be the benefit of having more chunks? — Joe
– Joe, Commented May 21 at 23:53
128 GB chunk size is just the initial and minimum size. The balancer can merge chunks. If you don't modify any data and wait some more time, then most likely you will get even just four chunks - one in each shard. — Wernfried Domscheit
– Wernfried Domscheit, Commented May 22 at 5:04
@Joe: I thought that having more chunks with smaller sizes would allow the balancer to move smaller portions of data between shards, which could help maintain balance as the dataset grows or as new shards are added. — bigbit
– bigbit, Commented May 22 at 7:15
@Wernfried Domscheit: The default chunk size is 128 MB, not GB, so I expected to have many chunks of this size. — bigbit
– bigbit, Commented May 22 at 7:15
In the meantime, I found this response from the Sharding Product Manager: mongodb.com/community/forums/t/…. It seems that in newer versions of MongoDB, it is normal to have a large data size per chunk. If anyone knows the technical reasons behind this change, and why larger chunks are now preferred over smaller ones, I would appreciate any insights. — bigbit
– bigbit, Commented May 22 at 7:15

aasawari sahasrabuddhe · Accepted Answer · 2025-05-27 10:45:09Z

0

In a MongoDB sharded cluster, the balancer thread runs on the config server primary and is responsible for performing chunk migrations to ensure an even distribution of data across shards. The goal is to have each shard own approximately the same amount of data for a given sharded collection.

Prior to MongoDB 6.1, the balancer focused solely on distributing chunks, not the actual data size. This meant that if chunks were unevenly sized, the cluster could appear balanced in terms of chunk count, while the underlying data distribution remained skewed.

Starting with MongoDB 6.1 (and backported to 6.0.3 with the Feature Compatibility Version (FCV) set to "6.0"), the balancer now distributes data based on data size rather than the number of chunks. This change coincides with the removal of the chunk auto-splitter, leading to more accurate and efficient data distribution across shards in sharded clusters.

answered May 27 at 10:45

aasawari sahasrabuddhe

156 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

bigbit May 27 at 20:56

Thanks for your explanation. I’ve updated my original post with more details (adding new shards to the cluster...). Could you please take a look and share your valuable thoughts? Thanks

Collectives™ on Stack Overflow

Why does my MongoDB sharded cluster have only 10 chunks after importing 716 million documents?

Update (2025-05-27):

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update (2025-05-27):

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related