Key Results
- 65% reduction in time to resolution
- Faster correlation between issues & deployments
- Reproduce & resolve issues from staging to production
- Tie specific errors to product areas & teams
SDK
Javascript, Typescript, Next.js, Node, Python, Electron, Android, Cocoa
Solutions
Error Monitoring, Mobile Application Monitoring, Release Health, Context, Custom Alerts, Issue Owners
Related Content
Mobile Application Monitoring with Sentry
JavaScript SDK “Package Size is Massive” - So we reduced it by 29%
How Atlassian Saves up to 4,160 Engineering Hours with Sentry
How Airtable Investigates and Resolves Issues up to 65% Faster with Context and Custom Alerts
Headquartered in San Francisco, Airtable is a connected apps platform that enables teams to modernize their business processes. More than 300k organizations, including 80% of the Fortune 100, rely on Airtable to connect their teams, data, and workflows.
Ranked in the top 20 on the Forbes Cloud 100 and a CNBC Disruptor 50 honoree two years running, Airtable gets why proactively monitoring, investigating, and resolving issues with the biggest impact on customers is crucial.
To that end, the team needed fit-for-purpose solutions to help a growing number of developers:
- Take ownership of what they build, with the autonomy to prioritize and resolve issues
- Speed up time to resolution with detailed context
- Tie errors to specific product areas and alert relevant teams
- Monitor release health and correlate issues with deployments, and
- Reproduce and resolve issues in staging and production environments
The main business driver was our ‘Scalable and Safe’ deployments effort, led by our Service Orchestration engineering team in collaboration with product engineering teams to improve and decentralize our release process.
Doug Forster, Software Engineer, Airtable
Increased velocity shouldn’t compromise quality
To support organizations like Amazon and IBM, Airtable ships new features and updates through multiple deployments a week. Keenly aware of the importance of release health and the ability to reproduce and fix errors from staging through production; they set out to find a solution that would help define issue ownership, let developers search for specific events to troubleshoot faster, optimize workflows for better triaging, and maintain release stability at scale.
One of their main goals was to support high-velocity feature development without sacrificing quality; whatever tool they landed on had to support Typescript, Next.js, Python, and, since their platform is used across devices, Android, iOS, and Electron.
Each of Sentry’s integrations is very well documented, and it was convenient to view the open-source code in some cases.
Airtable’s infrastructure team partnered with Sentry to improve canary analysis of their code before and during deployments, improve the accuracy of those alerts, and more effectively collaborate with teams on issue ownership and incident response. Overall developer experience was another consideration and what attracted them to Sentry was how straightforward it was to use, and how the product’s ubiquity could factor into finding and training new developers.
We want to use best-in-class tools to help our engineers be effective, and having a solution that other organizations widely use makes onboarding faster for new team members.
Putting the workflow… to work
As in any high-growth environment, more products and capabilities often mean a growing backlog of errors and challenges prioritizing them. Initially, when they received an error notification, they had to check various systems to find context and assign responsibility, which could take up to 15 minutes per event.
So, they added custom tags to their triaging workflow to filter issues and integrated source maps with Sentry to assign ownership. Enriching errors with custom context lets developers search either by arbitrary tags or specifically look at errors by product area to resolve priority issues. In this example they added a custom tag to AWS errors, monitoring for issues that might be related to ongoing project work. Adding a “wasThrownFromAwsSdk:yes” tag and searching for it in Sentry surfaces any issues in their staging environment that might’ve been caused by recent code changes.
We’ve seen a much better user experience for our engineers, particularly in the ability to search by tags, which we did not have with our previous tool.
After tweaking alert rules, teams can see the volume of events over time, highlighting those with the biggest impact on customers. They’ve further customized alerts to show errors by product area, such as marketing pages, which has sped up the time it takes to catch any regressions or new issues.
Once an alert fires, Sentry’s Slack integration notifies only the relevant team who can quickly segment the data based on related context for faster analysis and resolution.
Putting that workflow into action, the team recently set up an alert for unexpected errors related to loading Airtable bases. First, they established an error threshold and configured the alert to notify the feature team in Slack. If there’s an issue, the on-call team member is notified, drills down, and goes over any tags, which include metadata about the request and error. The tag histogram usually provides a breakdown by user ID or other metadata which helps identify the impact, as well as patterns related to the cause.
In the past, this type of investigation would require pivoting between tools to get alerted and it would take 10-15 mins to correlate it with the impacted customers. Now, we can do the same thing in Sentry in a couple of minutes.
Safer, more scalable deployments
With teams deploying new features at speed, keeping an eye on release health and resolving issues before they impact users is critical. Airtable enriches each error with a custom tag for the engineering team that owns a feature, this way if there’s an issue during deployment the deployer knows which engineering team to reach out to, to investigate the issue. Teams also have the ability to prioritize errors based on their impact, so that not all errors automatically block a deployment.
This makes deployments safer and more scalable by more quickly detecting problems. In most cases, the errors are detected during the ‘canarying’ phase of the deployment, so that any customer impact is limited.
With a focus on customer and developer experience, Airtable tweaked Sentry to fit how their teams work. This lets them route issues to the right people for faster investigations, reduces triaging times and the overall duration of incidents, and frees up developer time to work on other projects.
You’ve had a peek at how Airtable builds and ships software that helps teams from around the world collaborate and get stuff done. You’ve also seen how they continue to improve the developer experience at home, so if you’re up for a new challenge, they’re currently looking for an Infrastructure Software Engineer and a host of other engineering roles across the board.
We now have the ability to be even more proactive about monitoring and resolving errors.