Blameless Postmortem

De Basef
Ir para: navegação, pesquisa

The postmortem concept is well known in the technology industry. A postmortem is a written record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause(s), and the follow-up actions to prevent the incident from recurring. This chapter describes criteria for deciding when to conduct postmortems, some best practices around postmortems, and advice on how to cultivate a postmortem culture based on the experience we’ve gained over the years. [1]

Blameless postmortems are a tenet of SRE culture. For a postmortem to be truly blameless, it must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or inappropriate behavior. A blamelessly written postmortem assumes that everyone involved in an incident had good intentions and did the right thing with the information they had. If a culture of finger pointing and shaming individuals or teams for doing the "wrong" thing prevails, people will not bring issues to light for fear of punishment. [1]

Postmortem Example

Google Postmortem Example

References

  1. 1,0 1,1 "Postmortem Culture: Learning from Failure." Site Reliability Engineering. Retrieved from https://landing.google.com/sre/book/chapters/postmortem-culture.html