At Sentry, we receive a million requests a minute to process and store crashes from all around the world. And it’s our Operations Team’s responsibility to ensure that everything goes right with these requests, but it’s also their responsibility to not burn themselves out in the process of dealing with everything that goes wrong.
We collect fifty thousand custom metrics inside of DataDog, but only alert on less than fifty of them. James Cunningham leads our internal observability initiative, creating and maintaining those alerts.
In this talk, he discusses the full lifecycle of an alert at Sentry, including:
Includes the most interesting questions from the closing Q&A.