Infrastructure Engineer, Batch Data Platform

Stripe

  • Full Time

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

You’ll be on a team that maintains a product we provide to the rest of engineering, like storage, search or message queueing. You’ll make decisions with a significant impact on Stripe. There is a lot of work to do to make Stripe engineers’ work easier and our platform even more reliable than it is today, and we’d love for you to be part of it. We’re close to the people using our systems, so we constantly get feedback that we can use to make them better.

What you’ll do

We have a few dozen infrastructure engineers today spread across several different teams, and you’ll work with other infrastructure engineers as well as product engineers who use the systems you’re building.

We’re looking for people with a strong background (or interest!) in systems. We’d love to hear from you whether you’re a seasoned systems developer, or whether you’ve just learned you might like working with databases. Many of our infrastructure engineers work remotely, and we’d be happy to talk to you about the possibility of working remote.

Responsibilities

  • Design, build, and maintain the core infrastructure used by all of Stripe’s engineering teams
  • Debug production issues across services and levels of the stack
  • Plan for the growth of Stripe’s infrastructure
  • Build a great customer experience for people using your infrastructure
  • To get a concrete idea of what projects you might work on here, see the “Projects you could work on” section

Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements

  • Think about systems — their edge cases, failure modes, and lifecycles
  • Know your way around a Unix shell
  • Can debug complex problems across the whole stack
  • Focus on the needs of our users, both internal and external
  • Hold yourself and others to a high bar when working with production
  • A metrics driven approach and can make informed decisions using data
  • Are able to write high quality code in a programming language (e.g. Ruby, Scala, Go)
  • It’s not expected that any single candidate would have expertise across all of these areas. For instance, we have wonderful team members who are really focused on their customers’ needs and building amazing user experiences, but didn’t come in with as much systems knowledge

We have a ton of important work to do, which is why we’re hiring! Our projects are of course changing all the time, but here are a few projects either that we’ve done in the past, so you can get an idea of the types of work we do. Technologies we use include: haproxy, nginx, consul, jenkins, datadog, elasticsearch statsd, kafka, rabbitmq, storm, and others.

  • Plan and implement multi-region availability for our distributed job queuing infrastructure! All of our systems can sustain losing machines, and making our systems even more resistant to failure is a big theme for us. If you like thinking about distributed systems, you might find a good home here
  • Write easy-to-use and reliable client libraries for our Kafka or database systems. You’ll write abstractions and provide reasonable defaults around timeouts and error handling for a complex system
  • Move us to a region with no downtime. Last year, we needed to migrate AWS regions, and we pulled it off with no negative effects on our users and no downtime
  • Request tracing! Your mission: make it easier for any Stripe engineer, when debugging, to trace a request from its source down to every service it touched
  • Build fantastic code review tools! If you love helping developers be more effective at their jobs, we have a ton of interesting projects in this area. Related projects: you could help us have better reproducible builds with Bazel and build great developer environments
  • We have a bunch of projects around deploying and running code: help us instantly roll back bad deploys so that we can recover quickly, and build infrastructure that lets us scale up our API workers in seconds in response to high API load
  • We need to scale our databases to handle 10x the load they can today. You could help us shard them more effectively, upgrade our database engines, and build great tools for developers so they can understand their slow queries more easily. A lot of our database projects are open source
  • Build a seamless zero-downtime process to upgrade elasticsearch clusters. Our write-heavy workloads combined with our users’ need for reliability make this a unique challenge
Job Overview