I'm currently developing a mobile application powered by a Mongo database, and however everything is working fine right now, we want to add Sharding to be prepared for the future.
In order to test this, we've created a lab environment (running in Hyper-V) to test the various scenario's:
The following servers have been created:
- Ubuntu Server 14.04.3 Non-Sharding (Database Server) (256 MB Ram / Limit to 10% CPU).
- Ubuntu Server 14.04.3 Sharding (Configuration Server) (256 MB Ram / Limit to 10% CPU).
- Ubuntu Server 14.04.3 Sharding (Query Router Server) (256 MB Ram / Limit to 10% CPU).
- Ubuntu Server 14.04.3 Sharding (Database Server 01) (256 MB Ram / Limit to 10% CPU).
- Ubuntu Server 14.04.3 Sharding (Database Server 02) (256 MB Ram / Limit to 10% CPU).
A small console application have been created in C# to be able to measure the time to perform an insert.
This console application does import 10.000 persons with the following properties: - Name - Firstname - Full Name - Date Of Birth - Id
All 10.000 records differs only by '_id', all the other fields are the same for all the records.
It's important to note that every test is exactely run 3 times. After every test, the database is removed so the system is clean again.
Find the results of the test below:
Insert 10.000 records without sharding
Writing 10.000 records | Non-Sharding environment - Full Disk IO #1: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #2: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #3: 12 Seconds.
Insert 10.000 records with single database shard
Note: Sharding key has been set to hashed _id field.
See Json below for (partial) sharding information:
shards:
{ "_id" : "shard0000", "host" : "192.168.137.12:27017" }
databases:
{ "_id" : "DemoDatabase", "primary" : "shard0000", "partitioned" : true }
DemoDatabase.persons
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard0000 2
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong(0) } on : shard0000 Timestamp(1, 1)
{ "_id" : NumberLong(0) } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 2)
Results:
Writing 10.000 records | Single Sharding environment - Full Disk IO #1: 1 Minute, 59 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #2: 1 Minute, 51 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #3: 1 Minute, 52 Seconds.
Insert 10.000 records with double database shard
Note: Sharding key has been set to hashed _id field.
See Json below for (partial) sharding information:
shards:
{ "_id" : "shard0000", "host" : "192.168.137.12:27017" }
{ "_id" : "shard0001", "host" : "192.168.137.13:27017" }
databases:
{ "_id" : "DemoDatabase", "primary" : "shard0000", "partitioned" : true }
DemoDatabase.persons
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard0000 2
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 Timestamp(2, 2)
{ "_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong(0) } on : shard0000 Timestamp(2, 3)
{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 Timestamp(2, 4)
{ "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(2, 5)
Results:
Writing 10.000 records | Single Sharding environment - Full Disk IO #1: 49 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #2: 53 Seconds.
Writing 10.000 records | Single Sharding environment - Full Disk IO #3: 54 Seconds.
According to the tests executed above, sharding does work, the more shards that I add, the better the performance. However, I don't understand why I'm facing such a huge performance drop when working with shards rather than using a single server.
I need to blazing fast reading and writing s I tought that sharding would be the solution, but it seems that I'm missing something here.
Anyone why can point me in the right direction?
Kind regards