Today, I want to comment on one of the latest essays from
. By the way, if you are not yet subscribed to his newsletter, do it now and learn about real System Designππ»At a high level, what RaΓΊl writes in the article above is whether we should alert the users when something goes wrong with our product.
He starts his article by indicating 3 main βwhysβ for not bothering our beloved users when the core features arenβt affected. From my point of view, the most important one is not to bother the user with things that are not important. His examples in the article are clear. If the user is trying to buy a T-shirt, why should we display an error message because the recommendation pane, 2 scrolls down, got a 500 HTTP response from the backend? Because that will make the user lose his path for the use case: buy the T-shirt.
The tricky part is identifying the core features. Identifying what we could call Tier-0 services of functionality requires a certain level of maturity in the engineering organization unless you have a beautiful monolith. If a core feature does not work, yes, we should advertise the user.
RaΓΊl also wrote great strategies about what to do to avoid making noise to the users. I prefer to use some cache to display what was available last time. This is a bit more expensive than just displaying a blank block in the UI or hiding it. However, the benefit I see is that the userβs perception not be affected; all is normal and nothing is out of its normal place.
Love this sentence from RaΓΊlβs article:
The idea is to address the problem without impacting the user's experience.
By the way, did someone say SLSs? An important thing to do as well is to record the issue in your observability tool. You cannot miss that visibility; you cannot afford not knowing that the recommendation pane for the products is not working properly, even if itβs not a core feature.
You should have SLSs in place to get notified when a feature is not working for real. In the cases we want to fail silently, even though you would not have to be on a call (3 a.m. for example), with the SRE team, to solve the issue, you would like to set up the proper tracing and logs in advance, so you can debug the issue in your working hours.
Hope you enjoyed this review of the essay from my friend
as as I enjoyed reading it. Great job mate ππΌ.Thank you for supporting this newsletter.
Youβre now part of a community of over 405 people, edging closer to 500! Letβs aim to reach 500 by October 31st. Share this post with your friends!
You rock folks.
If you enjoyed this article, then click the π. It helps!
If you know someone else will benefit from this, β»οΈ share this post.
π Related readings
Photo credits: Robin Higgins.