Opinion About Failing Silently In Our Products

Yes, and depends on the impact over the users

Marcos F. Lobo 🗻🧭

Oct 16, 2024

Today, I want to comment on one of the latest essays from

Raul Junco

. By the way, if you are not yet subscribed to his newsletter, do it now and learn about real System Design👇🏻

System Design Classroom

Failing Silently is also an Option.

Thank you to our sponsors who keep this newsletter free…

9 months ago · 29 likes · 4 comments · Raul Junco

At a high level, what Raúl writes in the article above is whether we should alert the users when something goes wrong with our product.

He starts his article by indicating 3 main “whys” for not bothering our beloved users when the core features aren’t affected. From my point of view, the most important one is not to bother the user with things that are not important. His examples in the article are clear. If the user is trying to buy a T-shirt, why should we display an error message because the recommendation pane, 2 scrolls down, got a 500 HTTP response from the backend? Because that will make the user lose his path for the use case: buy the T-shirt.

The tricky part is identifying the core features. Identifying what we could call Tier-0 services of functionality requires a certain level of maturity in the engineering organization unless you have a beautiful monolith. If a core feature does not work, yes, we should advertise the user.

Raúl also wrote great strategies about what to do to avoid making noise to the users. I prefer to use some cache to display what was available last time. This is a bit more expensive than just displaying a blank block in the UI or hiding it. However, the benefit I see is that the user’s perception not be affected; all is normal and nothing is out of its normal place.

Love this sentence from Raúl’s article:

The idea is to address the problem without impacting the user's experience.

By the way, did someone say SLSs? An important thing to do as well is to record the issue in your observability tool. You cannot miss that visibility; you cannot afford not knowing that the recommendation pane for the products is not working properly, even if it’s not a core feature.

You should have SLSs in place to get notified when a feature is not working for real. In the cases we want to fail silently, even though you would not have to be on a call (3 a.m. for example), with the SRE team, to solve the issue, you would like to set up the proper tracing and logs in advance, so you can debug the issue in your working hours.

Hope you enjoyed this review of the essay from my friend

Raul Junco

as as I enjoyed reading it. Great job mate 👊🏼.

Thank you for supporting this newsletter.

You’re now part of a community of over 405 people, edging closer to 500! Let’s aim to reach 500 by October 31st. Share this post with your friends!

You rock folks.

If you enjoyed this article, then click the 💜. It helps!

If you know someone else will benefit from this, ♻️ share this post.