CosmosDB/DocumentDB partitioning with multiple types in same collection

Question

Official recommendation from the team is, to my knowledge, to put all datatypes into single collection that have something like type=someType field on documents to distinguish types.

Now, if we assume large databases with partitioning where different object types can be:

Completely different fields (so no common field for partitioning)
Related (through reference)

How to organize things so that things that should go together end up in same partition?

For example, lets say we have:

User

BlogPost

BlogPostComment

If we store them as separate types with type=user|blogPost|blogPostComment, in same collection, how do we ensure that user, his blogposts and all the corresponding comments end up in same partition? Is there some best practice for this?

[UPDATE] Can you ever avoid cross-partition queries completely? Should that be a goal? Or you just try to minimize them? For example, you can partition your data perfectly for 99% of cases/queries but then you need some dashboard to show aggregates from all-the-data. Is that something you just accept as inevitable and try to minimize or is it possible to avoid it completely?

Jesse Carter · Accepted Answer · 2018-03-08 20:20:42Z

5

I've written about this somewhat extensively in other similar questions regarding Cosmos.

Basically, when dealing with many different logical entity types in a single Cosmos collection the easiest option is to put a generic (or abstract, as you refer to it) partition key on all your documents. At this point it's the concern of the application to make sure that at runtime the appropriate value is chosen. I usually name this document property either partitionKey, routingKey or something similar.

This is extremely important when designing for optimal query efficiency as your choice of partition keys can have a huge impact on query and throughput performance. A generic key like this lets you design the optimal storage of your data as it benefits whatever application you're building.

Even something like tenant does not make sense as different tenants might have wildly different data size and access patterns. Instead you could include the tenantId at runtime as part of your partition key as a kind of composite.

UPDATE: For certain query patterns it might be possible to serve them entirely out of a single partition. It's definitely not the end of the world if things end up going cross partition though. The system is still quick. If possible, limiting the amount of partitions that need to be touched for a given query is ideal but you're never going to get away from it 100% of the time.

edited Mar 8, 2018 at 20:20

answered Mar 8, 2018 at 19:27

Jesse Carter

21.3k8 gold badges71 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

tlt Over a year ago

This is exactly what i am thinking about! Thank you for sharing this. I will look into details and come back with additional questions if i should have any left. Many thanks!

Jesse Carter Over a year ago

@deezg No problem :) Glad that it helped you out. Feel free to ask other questions if they come up

tlt Over a year ago

I've added one more (sub)question into my opening post. Could you please take a look and put into your answer so i can accept it. Thanks!

Jesse Carter Over a year ago

@deezg I've updated my answer to clarify your remaining questions

tlt Over a year ago

Man, you've saved me a ton of time. Thanks so much!

Mark Redman · Accepted Answer · 2018-03-08 18:49:03Z

2

A partition should hold data related to a group that is expected to grow, for instance a Tenant which will group many documents (which can be of different types as you have mentioned) So the Partition Key in this instance should be the TenantId. The partitioning is more about the data relating to a group than the type of data. If the data is related to a User then you could use the UserId, however many users may comment on the same posts so it doesn't seem like a good candidate for a partition key unless there is some de-normalization of the user info so it doest have to relate back to the other users directly.. if that makes sense?

answered Mar 8, 2018 at 18:49

Mark Redman

24.6k20 gold badges99 silver badges152 bronze badges

1 Comment

tlt Over a year ago

Yes it makes sense as an intention but i am still tapping in the dark about implementation. As you correctly point out, userId is not a good partition key in textbook blog example but i have a hard time understanding what might be? / I was mentioning data types only because they might be so different that they have no organic common fields (that might beused as partition key) but we might maybe need to add some abstract one.

Collectives™ on Stack Overflow

CosmosDB/DocumentDB partitioning with multiple types in same collection

2 Answers 2

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related