At Databricks, we are passionate about enabling data teams to solve the world’s toughest problems, from advancing AI research to powering next-generation applications. We do this by building and operating the world’s best data and AI infrastructure platform. Founded by engineers and driven by customer obsession, we embrace the hardest technical challenges, whether it’s scaling distributed systems across multiple clouds or delivering reliable, low-latency communication between thousands of services. And we’re only getting started.
As a Senior Software Engineer on the Application Traffic team, you will design and build the systems that power Databricks’ service-to-service communication across thousands of clusters in a multi-cloud environment. You will also help create abstractions that hide networking complexity from product teams, making connectivity, discovery, and reliability seamless by default.
The impact you’ll have:
You’ll work across three key areas that define Databricks’ networking stack:
- Ingress Control Plane: Build the control plane for Databricks’ global ingress layer. Enable programming of API gateways with static and dynamic endpoints, simplify service onboarding, and make it easy to expose APIs securely across clouds.
- Service-to-Service Communication: Design scalable mechanisms for service discovery and load balancing across thousands of clusters. Provide networking abstractions so product teams don’t need to worry about underlying connectivity details.
- Overload Protection: Build intelligent rate limiting and admission control systems to protect critical services under high load. Ensure reliability and predictable performance for both customer-facing and internal workloads.
What we look for:
- BS (or higher) in Computer Science or related field
- 5+ years of experience designing and building large-scale distributed systems
- Strong proficiency in one or more languages such as Java, Scala, Go, or C++
- Experience with service-oriented architectures and large scale distributed systems
- Familiarity with cloud platforms (AWS, Azure, GCP) and container/orchestration technologies (Kubernetes, Docker)
- Track record of shipping infrastructure that supports mission-critical workloads at scale
Preferred: background in service discovery, DNS, load balancing, Envoy, or related networking systems