Lead Site Reliability Engineer

DraftKings

Job Summary

Lead SRE initiatives across multiple projects and products, collaborating with cross-functional teams to shape platform and infrastructure engineering efforts. Drive technical excellence by mentoring engineers and fostering a culture of continuous learning and innovation. Architect and automate self-healing infrastructure with declarative configurations, GitOps, and event-driven automation. Design, develop, and maintain software-driven infrastructure automation. Own product deployment, performance tuning, monitoring, and alerting to ensure high availability and system efficiency. Create robust observability strategies with service level agreements.

Must Have

  • 6+ years managing distributed cloud environments
  • Expert in networking and web concepts
  • Deep expertise in Kubernetes and container runtimes
  • Experience with IaC and configuration management tools
  • Strong software development skills (Go, Python)
  • Leading engineering teams and guiding technology roadmaps
  • Strong understanding of Linux-based operating systems

Good to Have

  • Understanding of applications written in object-oriented languages (C#/.NET, Java)

Perks & Benefits

  • Bonus
  • Equity
  • Benefits

Job Description

We’re defining what it means to build and deliver the most extraordinary sports and entertainment experiences. Our global team is trailblazing new markets, developing cutting-edge products, and shaping the future of responsible gaming.

Here, “impossible” isn’t part of our vocabulary. You’ll face some of the toughest but most rewarding challenges of your career. They’re worth it. Channeling your inner grit will accelerate your growth, help us win as a team, and create unforgettable moments for our customers.

The Crown Is Yours

As a Lead Site Reliability Engineer, you will drive key initiatives to enhance the reliability, scalability, and efficiency of our infrastructure. You’ll collaborate across teams to architect infrastructure automation while mentoring other Engineers to foster a culture of continuous learning and innovation. In this role, you will shape deployment strategies, performance tuning, and monitoring frameworks to support our rapid growth.

What you’ll do as a Lead Site Reliability Engineer

  • Lead SRE initiatives across multiple projects and products, collaborating with cross-functional teams to shape platform and infrastructure engineering efforts across the organization.

  • Drive technical excellence by mentoring and guiding engineers, fostering a culture of continuous learning and innovation.

  • Architect and automate self-healing, fault-tolerant infrastructure with declarative configurations, GitOps, and event-driven automation for scalable deployments across public clouds and on-premise.

  • Design, develop, and maintain software-driven infrastructure automation to build internal tools and eliminate repetitive operational tasks.

  • Own and drive decisions on product deployment, performance tuning, monitoring, and alerting to ensure high availability and system efficiency in production.

  • Create robust observability strategies with service level agreements to support our rapid traffic growth.

What you’ll bring   

  • 6+ years of experience managing distributed cloud environments (GCP, AWS, vSphere, Nutanix) and platform automation at scale.

  • Expert-level understanding of networking and web concepts, with the ability to debug issues down to the packet level.

  • Deep expertise in container orchestration (Kubernetes) and container runtimes (Docker, containerd), with the ability to design, scale, and troubleshoot complex workloads.

  • Experience with Infrastructure as Code (IaC) and configuration management tools (Terraform, Ansible, Chef, etc.), ensuring scalable and repeatable infrastructure provisioning.

  • Strong experience developing software for automation and infrastructure tooling (Go, Python).

  • Experience leading engineering teams and guiding technology roadmaps in large-scale, distributed environments.

  • Strong understanding of Linux-based operating systems, including performance tuning, kernel debugging, and low-level system optimizations.

  • Understanding of applications written in object-oriented languages (C#/.NET, Java).

Join Our Team

We’re a publicly traded (NASDAQ: DKNG) technology company headquartered in Boston. As a regulated gaming company, you may be required to obtain a gaming license issued by the appropriate state agency as a condition of employment. Don’t worry, we’ll guide you through the process if this is relevant to your role.

The US base salary range for this full-time position is 148,000.00 USD - 185,000.00 USD, plus bonus, equity, and benefits as applicable. Our ranges are determined by role, level, and location. The compensation information displayed on each job posting reflects the range for new hire pay rates for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific pay range and how that was determined during the hiring process. It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.

15 Skills Required For This Role

Cross Functional Problem Solving Game Texts Networking C# Linux Aws Ansible Terraform Chef Docker Kubernetes Python Css Java

Similar Jobs