How DoorDash uses Sentry

Share

For the first edition of Customer Stories, we spoke to Zhaobang Liu, a DevOps Engineer on the infrastructure team at DoorDash. DoorDash is a rapidly growing delivery app that has received more than 10,000,000 orders from over 50,000 businesses delivered by 75,000 “Dashers” across 300 cities.

(Editors note: These numbers have certainly increased dramatically in the time it took to type this sentence. And this one. Wow, the numbers are so high they’re flooding the room and I’m floating away from my keyboard. Oh no, I can barely reach it now! Farewell!)

Zhaobang’s Story

The biggest thing I’m looking to accomplish at DoorDash is ensuring production grows in a scalable way. This is complicated by the fact that our app serves multiple audiences, each with their own specific and very different needs that have to be met across the web, iOS, and Android: the consumer ordering food, the merchant making that food, and the Dasher who’s delivering it. This leaves us with a ton of technologies and issues to consider.

This is all further complicated by rapid growth. I started a little over a year ago and, in that time, the engineering team has more than doubled in size, going from 30 people to more than 80. This is great in that we have a lot more talented people looking to do fabulous things, but it’s challenging because that much growth means we can’t be as patient with on-boarding as we’d like to be, which naturally leads to people taking their own personal approaches to doing things.

That’s dangerous because when someone tries something new without connecting with our infrastructure team — and then runs into an issue — we have to jump in to help them debug and work through the previously unknown approach they’ve taken. This eats up a lot of time that would be better spent ensuring every engineer is fully prepared.

To help developers get access to the knowledge they need, we’ve started running regular infrastructure brown bags. If they’re not familiar with a topic they can come and actively learn about it: how we use Docker, how we use Splunk, things like that. We’ve also established DevOps office hours every week; if a dev has a question then they can just swing by and we’ll work them through it. Both of these have been pretty successful. Developers are really feeling the responsibility to keep everything stable and scalable, with the knowledge that they’re working in a production environment, and the code they push will be used by millions of people.

It also helps that we’re very focused on monitoring here. Sentry in particular is really valuable at tracing exceptions of errors and providing automatic feedback so we can see things clearly and gain a better understanding of what’s going wrong.

Something I’m particularly excited about is a newer feature of NGINX called $request_id. When we send a request to NGINX, we add a response header so that when a request is sent we can see the same $request_id on our client side, while NGINX forwards the $request_id to any off-stream servers. Sentry is one service that is smartly able to catch it. Basically, when there’s an error, we can search it in Splunk and see the $request_id. If we know it’s an exception, we just go to Sentry and match the $request_id and can immediately see the exact issue and exception.

What is $request_id, and how can you implement it yourself? To quote NGINX:

$request_id is a randomly generated string of 32 hexadecimal characters automatically assigned to each HTTP request as it arrives (for example, 444535f9378a3dfa1b8604bc9e05a303). This deceptively simple mechanism unlocks a powerful tool for tracing and troubleshooting. By configuring NGINX Plus and all backend services to pass the $request_id value, you can trace every request end‑to‑end. This sample config is for our frontend NGINX Plus server.

The Technical Details

Configuring $request_id with NGINX

Start by configuring the frontend NGINX Plus server to include $request_id in a custom logging format, trace, which is used for the access_trace.log file.

log_format trace '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" "$http_user_agent" '
'"$http_x_forwarded_for" $request_id';
upstream app_server {
server 10.0.0.1;
}
server {
listen 80;
add_header X-Request-ID $request_id; # Return to client
location / {
proxy_pass http://app_server;
proxy_set_header X-Request-ID $request_id; # Pass to app server
access_log /var/log/nginx/access_trace.log trace; # Log $request_id
}
}

Configuring the Backend Application

Passing the request ID to your application is all well and good, but it does not actually help with application tracing unless the application does something with it. In this example we have a Python application managed by uWSGI. Let’s modify the application entry point to grab the request ID as a logging variable.

from uwsgi import set_logvar
def main(environ, start_response):
set_logvar('requestid', environ['X_REQUEST_ID'])

Then we can modify the uWSGI configuration to include the Request ID in the standard log file.

log-format = %(addr) - %(user) [%(ltime)] "%(method) %(uri) %(proto)" %(status)
%(size) "%(referer)" "%(uagent)" %(requestid)

With this configuration in place, we are now producing log files that can be linked to a single request across numerous systems.

Log entry from NGINX:

172.17.0.1 - - [02/Aug/2016:14:26:50 +0000] "GET / HTTP/1.1" 200 90 "-" "-" "-" 5f222ae5938482c32a822dbf15e19f0f

Log entry from application:

192.168.91.1 - - [02/Aug/2016:14:26:50 +0000] "GET / HTTP/1.0" 200 123 "-" "-" 5f222ae5938482c32a822dbf15e19f0f

Once this in place, you can match request ID fields across applications to quickly get a better handle on the root cause of a particular issue.

The above info comes directly from NGINX, and you can learn more here.

© 2020 • Sentry is a registered Trademark
of Functional Software, Inc.