Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.
With more than $217 million in funding and 85,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.
About The Team
The Engineering Operations team is responsible for the deployment, configuration, maintenance and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day, and processes terabytes of data to return complex aggregations with sub-second latency.
As a Lead Site Reliability Engineer, you will work with a multitude of technologies and have a direct impact on how Sentry evolves to handle 100x our current event volume.
You’ll contribute to the vision of Engineering Operations in a world of cloud providers and partner with other engineering teams in their efforts to grow and sustain Sentry.
In this role you will:
- Work closely with the Client Infrastructure team to improve event ingestion and processing pipelines.
- Ensure the uptime and reliability of Sentry's hosted platform.
- Architect and automate services and systems to meet the demand of scale.
- Collaborate with other Engineering teams to deploy and scale new and existing services.
- Be a member of the Engineering Operations team's on-call rotation, and be available to respond and resolve critical issues.
You'll love this job if:
- You've experienced leading the way.
- You enjoy fiddling with new cloud technologies and services.
- You’re not afraid to dig into system internals during the troubleshooting process.
- You've seen networks make and break hosted solutions, and have (or want to get more!) direct experience with growing and maintaining distributed systems.
- You’re familiar with the various SaaS ecosystems and have taken ownership of a service you once knew nothing about.
- You've got a story (or two) of royally goofing it and can tell us why it would never happen again under your watch.
- 7+ years relevant experience
- Good knowledge of replicated and distributed data storage systems
- Experience with some or all of the following tools we leverage:
- System Administration: Debian, Docker
- Cloud: Google Cloud Platform
- Databases: PostgreSQL, ClickHouse, Kafka
- Environment Management: Saltstack, Kubernetes, Terraform
- TCP/HTTP Routing: HAProxy, NGINX, Envoy
- Data Streaming Platforms: Kafka, RabbitMQ
- Good written and oral communication skills and ability to articulate technical concepts clearly and succinctly
- Competitive salary and meaningful equity
- 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
- Monthly commuter subsidy
- 401k program
- Learning & Development stipend
- Charitable matching program
- Generous parental leave policy
- Flexible working schedule and vacation policy, work from home policy, and real work/life balance
- Catered lunches
- Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
- Relocation assistance - you are living in, or willing to relocate to, Toronto, Canada area
COVID Vaccine Required - Reasonable Accommodations for Medical or Religious Reasons Considered
Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.Apply For This Role