How Nextdoor finds the (right) people at the (right) time to fix the (right) issues
Nextdoor is a trusted and essential community-building tool founded in 2008. With its platform, neighbors can exchange recommendations, stay in touch with local news and businesses, sell goods and meet newcomers.
Nextdoor connects ⅓ of US households, spans +11 countries and +280,000 neighborhoods globally, and grew its verified neighbors by 17% YOY last quarter. With its success, came scaling the company and its teams—and with scaling came the need to overhaul and significantly streamline their issue resolution processes.
Since implementing Sentry, Nextdoor has seen the following benefits:
- Acceleration of speed to resolution by 15-30 minutes per issue
- Decrease of 15 minutes per issue in triaging
- Increased proactivity in bug detection and resolution
Nextdoor’s code base is a Python and React monolith.
Before using Sentry, Nextdoor relied on their users—reaching out via support and social media—to alert them of issues. With more than 100 engineers shipping software and issues arising daily, the teams were having trouble tracking the sources and owners of the bugs.
Nextdoor advocates for code ownership and uses a module-based division of work: one team to one code area to one team owner. Engineering Manager, Can Zhang, leads the team in charge of Nextdoor’s “Newsfeed” Team. His team’s triaging process is unique. While each of Nextdoor’s teams have its own internal issue mitigation processes, his team’s work is affected by many others since the newsfeed is the focal point of the app.
Can’s team’s inter-relatedness with others made it challenging to sort through issues and assign code ownership. In order to know who last touched a file, Nextdoor was using Github’s CODEOWNERS—but that solution on its own wasn’t ideal. Their original Sentry setup was, as Zhang describes it, “naive.” Because of Nextdoor’s monolithic architecture, it was still difficult to tell which team was the owner at a glance. They were still relying on random spot-checks of the errors—making processes unscalable, yet again.
Before, when an issue came up, the process was to (attempt to) reproduce, determine, and triage. However, this chain of actions to resolution was messy and reactive: they’d wait to hear from customers and employees about a problem. From there, the engineering teams would be notified, who would (manually) create a Jira ticket and then begin back and forths to understand the context and properly troubleshoot the problem. Finally, after some lengthy searching, attempting to find the source and the owner, and troubleshooting, the issue would be resolved.
Nextdoor needed a simple approach that would allow them to serve their booming customer base without mishaps. And so, they required a scalable, more immediate, and proactive approach that would automate issue detection, alert, and ticketing systems.
After a major bug flew under the team’s radar due to not being able to reproduce the issue, Zhang was set on fixing his team’s underutilization of Sentry (which caught the bug and registered it in the issue list.)
[One month later]…we realized that something [was] actually wrong…And then when we investigated the problem, we saw all of the error logs that had been going into Sentry for that month, if we had been using Sentry properly, we could’ve identified and fixed the issue faster. Can Zhang Engineering Manager
Zhang began setting up custom alert rules to notify him and his team—selectively—of issues that were coming up. That way, he’d prevent team burnout from notifications and keep everyone paying attention to their channel-specific Slack alerts. Zhang also discovered Sentry’s Code Owners feature, which allowed him to ensure the errors were getting routed to the right team, and at the right time with Sentry’s Slack configuration.
“The out-of-the-box experience with these integrations feels like magic, it just works (and most of the time it isn’t too noisy),” explains Can.
Once the ticket is assigned, using stack trace, trace details, and Breadcrumbs, Can’s team can quickly troubleshoot and resolve issues. “Leveraging the additional context and Sentry provides, we now remove a lot of the back and forth between teams and users to resolve issues faster,” said Can.
Now, with Sentry—and the team’s growing familiarity with it—Nextdoor proactively catches issues before its customers, accelerates triage, and decreases time to resolution.
Finally, a streamlined workflow replaces the old, messy one. When Sentry catches an issue, it immediately triggers a notification in the Slack channels of the corresponding teams and automatically creates and assigns a Jira ticket. The appropriate teams can then review the issue and its source in Sentry and resolve it quickly.
Nextdoor’s engineering teams view the Slack and Jira integrations as critical to accelerating triaging; alerts are routed to the right channels and combine automation with human conversations about the issue. Zhang is now able to assign custom alert rules to notify the right team at the right time.
Some bugs are impossible to discover without Sentry. We have accelerated time to triage because of the richness of context Sentry gives us. It is more structured and straightforward to find issues and resolve them faster. Can Zhang, Engineering Manager
Sentry provides the rich context necessary not only for efficient resolution—but also for catching errors that would be impossible to discover otherwise. The teams can separate the signal from the noise to troubleshoot issues faster. And, using Code Owners with Sentry alerts demystifies who owns the bug and who needs to take care of it—asap.
By using Code Owners with Sentry alongside Slack alerts and automatic Jira ticketing, Zhang and his team have saved 15 minutes per issue in triaging, accelerated time to resolution by 15 to 30 minutes per issue, and have a far more proactive approach to bug tracking (without relying on users’ Tweets or customer support tickets.)
This process saves us 15 minutes of trying to reproduce and determine if something is wrong with our software, and then another 15-30 minutes trying to triage to the right team. And most importantly…some of these errors would be impossible to detect without the help of specific context around these errors because there are just so many ways people can use our product. Can Zhang, Engineering Manager
As Nextdoor continues scaling at breakneck speeds, it’s essential that their issue resolution methods scale with it. And so, next up for Zhang and his team is improving their front-end tracking with Sentry. They’re also planning to set up a better notification system for all teams and raise internal awareness of all that Sentry offers.