What about resilience & backups? #9
Replies: 1 comment
-
|
Hey @Julian-Brendel Thanks for getting the book and especially for taking the time for a thoughtful conversation. Let me preface this response for people reading in the future, just this week we had one of the larger AWS outages where many large services such as Duolingo, Fortnight, Amazon itself suffered hours of downtime due to something going wrong in the infamous us-east-1. That issue took them down for hours giving all of these major tech platforms way more downtime than Talk Python experienced last year with the setup we're about to discuss. Your thoughts on resiliency are fair. It may be worth adding a new chapter on it at some point. I didn't got into it for a couple of reasons. The primary directive of this book is to show people the do not need super complex setups to get started running their apps at near tech perf and reliability. The book takes a strong bias towards prefer simplicity. So yes, I could have shown people who to balance these across a couple of VMs using docker + your provider's load balanced (Hetzner and DigitalOcean both offer them). We could have used a more full featured DB in the demo app and run it in a cluster across machines too. But honestly, I don't think most web apps need this much safety. If you are worried about rolling out a bad batch of code, you'll need to carefully take one machine out of the load balancer, update it, test it, bring it back in, do the same for the other. But if that needs to run a DB migration, the other is likely to start failing immediately and needs a rushed update anyway. There are tons of things like this that would help but also make it way more complex and, in the case of this migration story, actually crash the running app whereas without the migration, it would have worked flawlessly. I figure I'll let the people who know they really need something like this add it on. Here's what I do:
As for building docker images on the server, that seems as good as a place building it locally and pushing it. I'd prefer to never have my code in any public registry, ever. Plus doing the build on the server helps me keep things like the static content used by nginx and a host of other things mapped into the containers (e.g. nginx config, transcript files from github for the courses, etc), 100% in sync without concern of them getting out of sync. Those are my thoughts. However, I do think maybe a mention / follow up chapter might make sense. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
While I enjoyed reading the book and was able to take away a couple of things (definitely giving uvloop and granian a go!!) was a bit surprised over the omission around resilience topics.
E.g single server is great, however what happens when that one disk does corrupt or there's some issue with the upgrade of mongo etc?
I am assuming you have VM backups configured in hetzner, however couldn't see this aspect mentioned.
Also bit surprised about the recommendation to build the images live on the server.
Are you keeping previous images in the local registry or simply overwriting them whenever there's new versions of the code?
What about some non python dependency changing, was working for a while due to being cached, now without a straight forward way to go back.
Would have expected for this to be offloaded into GH and built outside (still with all the optimisation things mentioned) to ensure reproducibility by keeping the images stored in e.g. dockerhub.
All in all the book flowed well and enjoyed reading it, however could have gone more into the "how to protect if things go really wrong" category at times
Beta Was this translation helpful? Give feedback.
All reactions