QT: Why Medit chose MongoDB

QT: Why Medit chose MongoDB

We chose MongoDB at Medit, what would you choose?

QT: https://medit.online chose MongoDB as our main datastore? Email: [email protected] Twitter: https://twitter.com/dave_albert Instagram: https://www.instagram.com/dave.albert/ Websites: https://dave-albert.com | https://medit.online

Sun, 11 Nov 2018 11:11:13 GMT
duration: 7:43
Spotify | iTunes | Sticher | Google Play | Player.fm | MyTuner Radio  

Transcript:

Hey, folks, this is Dave Albert with a quick take. This quick take is why we chose MongoDB at Medit. One of the biggest reasons would be because of the unstructured data. When you’re dealing with RSS feeds like we are because we’re scraping the medical web, you don’t always get the exact same content. It’s usually pretty similar but not always, and then when you’re dealing with the journals, they can be even more out of sync than typical blogs, the ones all based on WordPress are pretty similar, but you’ve got – some have some fields, some have additional fields, some have fields that are unexpected, and then when you’re dealing with machine learning to process those, there’s articles, you need those additional fields sometimes, to categorize and understand what the content is about. We do still use the actual content to determine categories and topics, but the – every data point that you have can be very useful. Also, during the development, it was really nice not to have to deal with the schema having definitely change that each time on each of the different environments, the development environments for each developer, the testing environments, the production environments, obviously. Also, mostly, the application has less relationships than what you would need a relational database like MySQL for. There are some relationship requirements in there, though that that has been one of the challenges. We started expecting to put everything related to a user in the user’s documents. For those unfamiliar with Mongo, a document is similar to a table, a table row, so a collection is a like a table. A document is like a table row, so we expected to keep everything in a user’s document related to that user. Realized though that could outgrow the size available in Mongo, so I had to split that out and have the elements that were relational, have the IDs in an array within the user document, and that works most of the time because we don’t normally need that information with each and every request. So, like a join, we don’t really need to do that, but there is one specific case where it is very valuable. [coughing] Excuse me. So, what I found is that as Mongo has continued to develop, that using aggregation, Mongo Aggregation, is very powerful and there’s a lookup command, $LOOKUP that basically will do what a join will do, so you gain a lot of that join functionality. Now, I can’t say that it’s necessarily as efficient as a join, I can’t say that it’s not – I haven’t benchmarked the two. We already have so much data and the requirement for it to basically be unstructured to be able to be processed properly, I’m not sure how that would go with any other relational database. I know a number of them have started to add json as a field type, but I’m not sure how powerful they are at actually processing the json elements, so there’s that. It’s been working well for us. I mean, like I said, we’ve had challenges with the lookup – with a few places where a join would probably be easier, and I can’t say that it would be more efficient because the lookup is very powerful, and using the aggregation pipelines for our analytics, our custom analytics investigations and our machine learning elements, it’s done pretty much everything we need. One thing that’s really interesting is what’s coming with Mongo 4. Well, I mean, it’s out now but it’s still in development. I don’t usually deal with anything that’s in a .0 version because it’s a little too new for me, but they are releasing – well, 4 is released but transactions are now available, so you can lock the data until the full transaction has been written, so that if you need to update multiple collections, at the same time, for the transaction to be complete so that each table is updated, that is now available which is really interesting. I like the work Mongo are doing. Still curious if we made the right decision, but at the time, it was the best decision we could make, and I still really like Mongo, so there’s that. Cheers.