We increased our instance count by 75% during the incident, ending with the highest number of webapp hosts that we’ve ever run to date.Įverything seemed fine for the next eight hours - until we were alerted that we were serving more HTTP 503 errors than normal. We autoscale quickly when workers become saturated, as happened here - but workers were waiting much longer for some database requests to complete, leading to higher utilization. As a result of the pandemic, we’ve been running significantly higher numbers of instances in the webapp tier than we were in the long-ago days of February 2020. Our CEO Stewart Butterfield has written about some of the impact of the lockdown and stay-at-home orders on Slack usage. One of the incident’s effects was a significant scale-up of our main webapp tier. We had some customer impact, but it lasted only for three minutes and most users were still able to send messages successfully throughout this brief morning incident. The change was quickly pinpointed and rolled back - it was a feature flag which performed a percentage-based rollout, so this was a fast process. The increased load on the database was due to a rollout of a configuration change, which triggered a longstanding performance bug. Our Database Reliability Engineering team was alerted for a significant load increase in part of our database infrastructure at the same time as our Traffic team received alerts that we were failing some API requests. The user-visible outage began at 4:45pm Pacific time, but the story really begins around 8:30am that morning. We published a summary of the incident shortly after, but this story is an interesting one, and we’d like to go into more detail on the technical issues around it. On May 12, 2020, Slack had our first significant outage in a long time. To learn more about the process behind incident response for same outage, read Ryan Katkov’s post, “ All Hands on Deck”. This story describes the technical details of the problems that caused the Slack downtime on May 12th, 2020.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |