The Principles of Data Modeling for MongoDB

preview_player
Показать описание
Creating a schema for a relational database is a straightforward process. Designing a schema for a MongoDB application may seem a little more challenging. However, it doesn't have to be if you follow the main principles that MongoDB has identified for its users. This talk will go over these data modeling principles. We'll also reveal modeling tips to address the constant changes in the data technology world like new features in MongoDB, hardware evolution, data lakes, and the growing impact of analytics.

Speaker: Jay Runkel, Distinguished Solutions Architect
Company: MongoDB
Level: Intermediate

#MongoDBlocalToronto22
Рекомендации по теме
Комментарии
Автор

I've been trying to figure out if MongoDB is right for my use case. I've read a lot of familiarization content, including the Manual. Almost all of this content is sparse in two areas,

1) Many-to-many relationships between two types of documents that are large and complex. Tags and posts are easy. What about Courses and Students? Should I store an array of references on one, the other, or both? How do I optimize for a Student who wants to list all their Courses AND Teachers who want to list all the Students in a Course?

2) The practicalities of changing data in multiple locations. Say I have references in Students and Courses. If a Student needs to drop a Course, do I have to manually remove references in both locations? Will the aggregate system take care of that for me? What if someone comes along later and deletes the reference in Students but not Courses? What mechanisms can enforce the synchronization of references?


Obviously I haven't read everything out there. I may be misunderstanding basic things. But the simple cases are not illustrative enough for me. This is especially true when update examples are not included. Someone please tell me where to go from here. Where are some practical examples of many-to-many relationships of complex documents?

meepk
Автор

That was awesome, thank you for sharing this.

fancasopedia
Автор

Amazing talk, a lot of information, I wish I saw this long time ago!

ebrahimsaed
Автор

Great talk...This will help in making engineering decisions on query optimization techniques

omokechuku
Автор

Document Versioning with the C# driver is only implemented at the root level. If a nested class has a field removed, the deserialiser throws an exception. If a root level field is removed, the old value is added to the ExtraFields key/value list. The only fix for this is to implement full versioning in every subclass.

When we deserialise each document, we check the version# at the root and if it's old we update the in-memory copy and set a flag to say the document has been upgraded (or downgradded if code was rolled back). The application can then decide to update the record back to Mongo or continue. The advantage is, a result set returned to the user is consistent and can be bound to the UI.

This approach is working well for 100 million+ collections and very robust (we also automatically create/drop indexes based on the registered latest schema version).

The downside, is having every subclass requiring a version number and a ExtraFields collection plus the logic to be able to check all nested classes including versions in every array element.

The root ExtraFields field should include any missing fields from any level of subclasses.

Every C# developer I have discussed this with, has found the same problem and given up on the implementation, which is sad, because in general the versioning works very well except for nested classes (which is a design pattern, not to have a lot of fields at the root).

Mark.Brindle
Автор

Really great overview! I have a question about the schema versioning approach for updating the data model:

Since the application covers the migration ad-hoc, chances are there will be some less frequently accessed documents stuck on the old version.
This means that the implementation of the old version will stay in the codebase much longer (and possibly indefinitely).
Is there a good way to migrate all documents at once (in batches) or would you not recommend that?

I would like to optimize for technical debt and legacy code.
Many thanks in advance :)

simonvandenhende