4

Official recommendation from the team is, to my knowledge, to put all datatypes into single collection that have something like type=someType field on documents to distinguish types.

Now, if we assume large databases with partitioning where different object types can be:

  1. Completely different fields (so no common field for partitioning)
  2. Related (through reference)

How to organize things so that things that should go together end up in same partition?

For example, lets say we have:

User

BlogPost

BlogPostComment

If we store them as separate types with type=user|blogPost|blogPostComment, in same collection, how do we ensure that user, his blogposts and all the corresponding comments end up in same partition? Is there some best practice for this?

[UPDATE] Can you ever avoid cross-partition queries completely? Should that be a goal? Or you just try to minimize them? For example, you can partition your data perfectly for 99% of cases/queries but then you need some dashboard to show aggregates from all-the-data. Is that something you just accept as inevitable and try to minimize or is it possible to avoid it completely?

2 Answers 2

5

I've written about this somewhat extensively in other similar questions regarding Cosmos.

Basically, when dealing with many different logical entity types in a single Cosmos collection the easiest option is to put a generic (or abstract, as you refer to it) partition key on all your documents. At this point it's the concern of the application to make sure that at runtime the appropriate value is chosen. I usually name this document property either partitionKey, routingKey or something similar.

This is extremely important when designing for optimal query efficiency as your choice of partition keys can have a huge impact on query and throughput performance. A generic key like this lets you design the optimal storage of your data as it benefits whatever application you're building.

Even something like tenant does not make sense as different tenants might have wildly different data size and access patterns. Instead you could include the tenantId at runtime as part of your partition key as a kind of composite.

UPDATE: For certain query patterns it might be possible to serve them entirely out of a single partition. It's definitely not the end of the world if things end up going cross partition though. The system is still quick. If possible, limiting the amount of partitions that need to be touched for a given query is ideal but you're never going to get away from it 100% of the time.

Sign up to request clarification or add additional context in comments.

5 Comments

This is exactly what i am thinking about! Thank you for sharing this. I will look into details and come back with additional questions if i should have any left. Many thanks!
@deezg No problem :) Glad that it helped you out. Feel free to ask other questions if they come up
I've added one more (sub)question into my opening post. Could you please take a look and put into your answer so i can accept it. Thanks!
@deezg I've updated my answer to clarify your remaining questions
Man, you've saved me a ton of time. Thanks so much!
2

A partition should hold data related to a group that is expected to grow, for instance a Tenant which will group many documents (which can be of different types as you have mentioned) So the Partition Key in this instance should be the TenantId. The partitioning is more about the data relating to a group than the type of data. If the data is related to a User then you could use the UserId, however many users may comment on the same posts so it doesn't seem like a good candidate for a partition key unless there is some de-normalization of the user info so it doest have to relate back to the other users directly.. if that makes sense?

1 Comment

Yes it makes sense as an intention but i am still tapping in the dark about implementation. As you correctly point out, userId is not a good partition key in textbook blog example but i have a hard time understanding what might be? / I was mentioning data types only because they might be so different that they have no organic common fields (that might beused as partition key) but we might maybe need to add some abstract one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.