From b831ca3417e66e10d4ba7eb1ebcbcbb3ae0f318e Mon Sep 17 00:00:00 2001
From: Charles-Axel Dein <ca@d3in.org>
Date: Thu, 27 Aug 2020 10:08:32 +0200
Subject: [PATCH] Add testing in production: the hard parts

---
 README.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/README.md b/README.md
index 2ff4509..4bc5008 100644
--- a/README.md
+++ b/README.md
@@ -742,6 +742,21 @@ Rob Pike, [Go at Google: Language Design in the Service of Software Engineering]
   - Wood's theorem: As the complexity of a system increases, the accuracy of any single agent’s own model of that system decreases rapidly.
   - The more tools and code that you add to create elements in a system, the harder it is to replicate an environment encompassing those tools and code.
   - At the core of testing in production is the idea of splitting deployments (of artifacts) from releases (of features).
+- [Testing in Production: the hard parts](https://medium.com/@copyconstruct/testing-in-production-the-hard-parts-3f06cefaf592), Cindy Sridharan
+	- The whole point of [actual] distributed systems engineering is you assume you’re going to fail at some point in time and you design the system in such a way that the damage, at each point is minimized, that recovery is quick, and that the risk is acceptably balanced with cost.
+	- How can you cut the blast radius for a similar event in half?
+		- Differentiate between deployment (0 risk) and release
+		- Build a deploy-observe-release pipeline
+		- Make incremental rollouts the norm (canaries, %-based rollouts, etc.)
+		- Test configuration changes just like you test code
+		- Default to roll back, avoid fixing forward (slow!)
+		- Eliminate gray failures - prefer crashing to degrading in certain cases
+		- Prefer loosely coupled services at the expense of latency or correctness
+		- Use poison tasters (isolate handling of client input)
+		- Implement per-request-class backpressure
+		- Have proper visibility from a client/end-user standpoint (client-side metrics)
+
+
 
 ### Security