I would like to know if there is a rule of thumb about when to use a new document and when to use a sub document. In sql database I used to break all realtions to seperate tables by the rule of normalization and connect them with keys , but I can't find a good approch about what to do in mongodb ( I don't know how other no-sql databases are handled). Any help will be appreicated. Kind regards.
1 Answer
Though no fixed rules, there are some general guidelines which are intuitive enough to follow while modeling data in noSql.
Nearly all cases of 1-1 can be handled with sub-documents. For example: A user has an address. All likelihood is that address would be unique for each user (in context of your system, say a social website). So, keeping address in another collection would be a waste of space and queries. Address sub-document is the best choice.
Another example: Hundreds of employees share a same building/address. In this case keeping 1-1 is a poor use of space and will cost you a lot of updates whenever a slight change happens in any of the addresses because it's being replicated across multiple employee documents as sub-document. Therefore, an address should have many employees i.e. 1 to many relationship
You must have noticed that in noSql there are multiple ways to represent 1 to many relationship.
- Keep an array of references. Choose this if you're sure the size of the array won't get too big and preferably the document containing the array is not expected to be updated a lot.
- Keep an array of sub-documents. A handy option if the sub-documents don't qualify for a separate collection and you don't run the risk of hitting 16Mb document size limit. (thanks greyfairer for reminding!)
- Sql style foreign key. Use this if 1 and 2 are not good enough or you prefer this style over them
Modeling documents for retrieval, Document design considerations and Modeling Relationships from Couchbase (another noSql database) are really good reads and equally applicable to mongodb.
3 Comments
views and likes collection, as you said. It's because you never know how many of them a song will get and keeping user references is a requirement. You should also keep view_count, and like_count fields in song document and increment them whenever a new view or like doc is created. This way you don't need a separate query to get just the count.