Add testing in production: the hard parts

This commit is contained in:
Charles-Axel Dein 2020-08-27 10:08:32 +02:00
parent c5614654ff
commit b831ca3417

View File

@ -742,6 +742,21 @@ Rob Pike, [Go at Google: Language Design in the Service of Software Engineering]
- Wood's theorem: As the complexity of a system increases, the accuracy of any single agents own model of that system decreases rapidly.
- The more tools and code that you add to create elements in a system, the harder it is to replicate an environment encompassing those tools and code.
- At the core of testing in production is the idea of splitting deployments (of artifacts) from releases (of features).
- [Testing in Production: the hard parts](https://medium.com/@copyconstruct/testing-in-production-the-hard-parts-3f06cefaf592), Cindy Sridharan
- The whole point of [actual] distributed systems engineering is you assume youre going to fail at some point in time and you design the system in such a way that the damage, at each point is minimized, that recovery is quick, and that the risk is acceptably balanced with cost.
- How can you cut the blast radius for a similar event in half?
- Differentiate between deployment (0 risk) and release
- Build a deploy-observe-release pipeline
- Make incremental rollouts the norm (canaries, %-based rollouts, etc.)
- Test configuration changes just like you test code
- Default to roll back, avoid fixing forward (slow!)
- Eliminate gray failures - prefer crashing to degrading in certain cases
- Prefer loosely coupled services at the expense of latency or correctness
- Use poison tasters (isolate handling of client input)
- Implement per-request-class backpressure
- Have proper visibility from a client/end-user standpoint (client-side metrics)
### Security