CosmosDB in Azure, Yes, No, Maybe
We wanted to use CosmosDB, It won't work ... Maybe it will
Spotify | iTunes | Sticher | Google Play | Player.fm | MyTuner Radio
Hello, and welcome to the podcast I’m your host, Dave Albert. In the show, I talked about technology building a company as a CTO, and co-founder, and have guests to discuss their roles in technology and entrepreneurship.
In this episode, I’m going to talk to you about Microsoft Asher’s CosmosDB. You may want to listen to why Medit chose MongoDB the episode or take this podcast. First, it probably isn’t necessary but if you’re interested in why we chose Mongo over some other data store technology that might help I had basically decided CosmosDB was not going to work for us. But I went to a meet up last night and got a bit more of a better understanding of the proper way to deal with Cosmos. So I’m not sure if we’ve given up.
To start off, tell you a little bit about cosmos it is Azure’s one database to rule them all. At least that’s what they’re going for so, it’s a multi-model, which means you can use basically the API for SQL or Mongo, or graphing database Gremlin, I think the syntax they use or Cassandra or Azure tables it is multi-region, it is all there’s a low cost of ownership sort of depends on how you architect it. But Microsoft basically deals with all the infrastructure and you just have your connection, your and connection key and it’s basically immediately geo-redundant, you do not have to deal with backups are tuning, of course, that means you also are are the able to deal with backups or tuning. So you’re at the mercy of Azure, but maintaining your own multi-country datacenter … datacenter in air quotes to have datacenters all around the world. So that if one datacenter goes down, there’s, you know, overhead in that. So it kind of all depends on what your needs are, if you really don’t need to necessarily tune your database. But that does not necessarily mean just adding indexes, indices or by maintaining data and those sorts of things. But actually, like lower down to the choosing the right level for the heart for the disks that the databases running on and, you know, other elements like that, then this might be a solution for you, you know, it’s not perfect, because there are a few things I’ll get into in a little bit, I want to just go through some more the notes that I took from the meetup, so I can’t read my own handwriting. Oh, so Azure is a or cosmosDB is moving very fast. And the the documentation is usually out of date, because the documentation can’t keep up with the speed of the development, which is a good thing that they’re moving the development along for new features, more reliability, more fee, making it more useful. But also, it can make it harder to learn how to do something that you need to do, because the documentation, but out of date, I mean that that’s kind of the world we live in anything that moves quickly. The docs are almost always a bit behind how many times you got to look at something. And all the screenshots are from a version in the past, I know I do.
So like I said, there’s it’s multi-model. So you can have the API for SQL or Mongo. So that was very compelling for us, although it’s not 100% there with Mongo. From what I understand, it basically is a replication of SQL. And there’s very little that anyone I’ve spoken to, or listen to, or read from, that uses SQL as issues with, but there are a number of things that are not the same as what you would hope for for longer, so it’s not really a drop in replacement. So now another reason it’s not a drop in replacement. The the way or the pricing works for cosmos is you create request request units. And request unit is based on anything that happens. So a read, write, I think storage is included I’m not sure what with the exact formula for identifying or a request unit, but from what I understand 1k of one read is one request unit, and write is more expensive. So the way that that it works is for every, see you have your database, so you create a deployment of Cosmos DB within that you create a database. So pretty much that’s the same as what you would expect with any sort of database technology. Then you create a container and a container can be thought of similar to what a SQL database would have as a table or a MongoDB would have as a collection then within that you have the entries. So that would be your documents or rose. Now, where it gets squirrelly is that for each container, you have to specify the minimum number of request units, which is the minimum is four hundred and four hundred requests units, roughly equals out to about 24,23, 24 euro per month. So if you basically take every collection you have, and create a new container for them you’re looking at, n times 23 or 24 euro. So like, in our case, we have I’m not sure the number but quite a few small collections. And so we’ve got large collections posts, users getting larger, hopefully, that grows really quickly. We’ve got smaller ones like lists, those are growing, but not as fast as anything else. Collections, collection it’s not growing as fast, hopefully, that also grow quite quickly in the future, feeds that’s growing, but not very quickly, config and that’s definitely not big at all and quite a few others.
So if we were to take we’ll say there’s 10, if we take 10 new containers, and cosmos, that’s 230 euro a month for our smallest smallest collections from Mongo and that’s before anything grows, right. So that is impossibly expensive over the long haul, I mean, all of those could be handled by the smallest instance. So five euro, right, so or five euro, or $5 resolution instance. Right, that could be served by that probably without any difficulty. So that’s, that’s out the way this is solved is by not thinking of a container as a collection, but more as a store that your collections can go into. So here’s where I’m starting to rethink our strategy. And if it might make sense, I have definitely not decided to do this. Because I know we’ve got a number of places where we do Mongo lookups, and I’m not sure we would want to re-architect that. But we still have to make our geo-replication strategy much more robust. And not maintaining infrastructure is often a good way to go, often, not always. So you can have, you know, some sort of each container should have a partitioning key. So what we could do is have some formula that would create the collection type and some other element to partition it. Because if you get over 10 gig, then the partition falls over or Yeah, so if a partition gets to be over 10 gig, they can’t replicate it. And it basically stops working the way you would hope. So you need a good way to the same way if you were going to chart and basically, you have to chart it, because you sticking more than one collection in a container, it has to be a logical how you can split those up. So sure, we could do each collection as a partition key. So the collection type, but some of the collections will outgrow and that isn’t useful, there’s really, there’s no reason reasonable limit, it’s basically unlimited.
Now, obviously, there’s an asterisk on that. But one of the guys that talked last night said that he has basically 4 million keys. So really doubt you need to worry too much about having too many keys. I mean, I don’t think you would want each individual entry to be a key, that doesn’t make much sense. So if you have recollection type, and then something else as the key doesn’t have to be the same for each one. So like, it could be collection type users and country now, but I don’t know, I haven’t thought about, you know, the actual implications of that whether it would outgrow per country the 10 gig limit, but that might work. Or it could be well, I guess what we could do something like users underscore country underscore specialty, right. So I’ve really doubt that we’re going to outgrow 10 gig of specific specialties in a specific country. So that that might work or something similar to that. Yeah, okay, that might work. So it can be less expensive than trying to maintain, you know, multiple, three or five different datacenters with different Mongo connections, maintaining all the backups, the disaster recovery plans for each of those instances in each of the countries. So perhaps, if I were starting from scratch, I would seriously consider using it we are, you know, where we’ve been live for quite some time when you’re trying to finalize a few bits of the product. And it makes more sense to focus on that right now. But there’s a risk by not having the geo replication completed just yet. So I’m not positive a few more bits that might be interesting about cosmos, that by default, everything is indexed, every field is indexed, which has a cost to it, but means that you don’t have to figure out what to index.
So as you’re growing, that can be really useful. I know that we’ve had times where we’ve had to go and re-indexed data, because we’ve run out of our performance degraded so poorly became so poor, on some queries that, okay, we have to create a new index that takes time, that creates a burden on the server, it takes processing and memory to build those in the indices. And that can have a negative impact on the I even your testing environment. So we’ve had our Mongo server fall over before not mongo’s fault, it’s because it’s a small instance, because that’s all we need. And it was building really large index and you throw in your typical load at it, which is, which normally brings it up to 78 percent. And then if it’s already 90%, trying to run the the new index, then it’s not going to perform very well. And it actually crashed, I mean, we were able to bring it back out, no problem. But, you know, it’s just something to think about is that if everything is indexed, you don’t have to think about them too much. So it’s kind of the same, nice unique that’s about schema lists is that you don’t have to necessarily know while you’re developing and growing the product, every bit of how it’s going to work. Because, you know, Azure, as you are no pun intended on the Azure as you are developing it, the your understanding of it’s going to change. So that was interesting, but you can change what those what is indexed in each container. So that’s another thing to consider is that sorry, that each container has the same indexing rules. So you may need to think what that means. So then what you can do is to ensure the you’re not out running, your request units you monitor for four to nine errors. This is out of our use, I think, I’m not sure that the exact wording of the error. But that’s what it means is that you’ve run out of a question. It’s, and it’s going to be throttled now. That can be mitigated a little bit by using the Microsoft SDK for connecting to Azure I think so, C Sharp has everything, I think node has everything so the Microsoft’s Azure’s Cosmos MPM module or from Nuget for C sharp. I know there were a few others being they weren’t all as fully fleshed out, either was something in go line, I can’t remember how deep that went.
But it’s definitely a good idea already takes into account back off and circuit breakers and all the things that you would want to to ensure that you didn’t lose data done for you, you can, of course, do that yourself. The the one of the guys last night did a demo just using power shell that was a, you know, move like a bash script. But in power shell that basically did it by connecting to the REST API. So you can use anything, the SDK specifically by Microsoft has some really useful elements to make sure you don’t lose data. Of course, the key partitioning as I mentioned before, it looks a bit like it could become a giant quagmire because you can’t change partitioning key, you’d have to basically extract all your data, redefine what your partitioning key is, and then import all your data again. So that can be, you know, a serious problem if you outgrow your partitioning key. So you definitely need to think long term on that. Yeah, but I’m considering if we might do that. And the strategy I probably will use will be with Azure Functions, that’s the server lists. So like AWS lambda, is I’m not sold 100% on server lists. And it’s not about like, I am in love with the idea of service. It’s the actual implementation and how deployments work. And it’s the same thing with microservices. Right, microservices are great when they fit the need. If you’ve got a small team, all working one application at the same time, with very little clashes and, you know, merge conflicts.
Basically, I think this is not from experience, this is from thinking about it long and hard, I think, huh, good grief. Sorry, I think the benefits do not outweigh the cost. And the cost, I mean, the cognitive load cost, you have to think about things differently, it’s harder to test the integration locally, maybe even hard to test anything locally, I haven’t looked at the state of local testing of server lists recently, it was it was doable. It didn’t seem simple and intuitive when I looked into it, but might consider defining what our container key partitioning strategy would be. And then slowly move different elements over and use server list functions to connect to that. And then our current API could basically be a API gateway type thing where instead of connecting to them model based on connecting to Mongo, they would still connect to the model except it would pull it from request to an Azure function to I think if we move, that’s the way we’ll go. It would take us way too long to do a big bang move where every separate well, I guess we could still do that directly having the model connect to cosmos, instead of connecting to Mongo. My work, I can’t tell from trying to inflict service onto the application or if it’s the right move, and you just do a little more deep thinking, that’s kind of what this podcast does for me anyway, is, as I start to have to say, my thoughts out loud, it begins to clarify my thinking. So perhaps it will be even more interesting if I had a co-host who were to ask questions about what I’m saying, since the listeners can’t ask me the questions and in real time, that might be useful.
Anyway, the cosmos might be perfect for you. There are some complexities in the way it’s set up. It seems amazing not to have to deal with the infrastructure. I say that as an ops guy, right? So I’ve said before, I’m 51% ops and 49% Dev. So I’m a sysadmin at heart but also a coder, right? So I like maintaining infrastructure. But when you’re trying to build product and team at the same time, Something’s got to give, right? So the hands-on ops is one of the things that maintains things from falling over it. It prevents disasters and problems, but it doesn’t move the product forward. It’s a risk because if you can’t trust your infrastructure, then can’t trust your application. So nobody’s going to use it. But it doesn’t make it better by making sure that it’s more reliable. Yeah, I heard it, it did. That didn’t exactly make 100% sense. But I think you know what I’m trying to say, do collectively.
I don’t know but I’d love to hear your thoughts. So as always, you can email me [email protected] or on Twitter @Dave_Albert. Bye!
Until next time remember, any sufficiently advanced technology is indistinguishable from magic.