Director of Resilience Engineering

DraftKings

6+ Years | United States (Remote) | Full Time | 1 months ago

Apply Now

Job Summary

As Director of Engineering, Performance and Resilience at DraftKings, you will establish and scale a new, high-impact capability to drive system performance, reliability, and operational excellence. This role unifies Performance and Chaos Engineering, owning the technical vision, delivery roadmap, and team culture. You will shape how resilient systems are built, tested, and operated at scale, enhancing customer experience and engineering confidence.

Must Have

Build and lead a new Resilience Engineering team.
Define and execute a resilience strategy.
Develop and evangelize standards, tooling, and playbooks.
Partner with Infrastructure, SRE, and Domain teams to identify risks.
Integrate resilience testing into the SDLC.
Launch and operate a centralized Reliability Lab.
Drive adoption of resilience principles.
Communicate technical strategy and results to senior leadership.
Bachelor's degree in Computer Science or equivalent.
At least 6 years of experience in large-scale, growth-oriented environment.
At least 4 years in technical leadership in software, platform, or reliability engineering.
Deep expertise in performance optimization, distributed systems scalability, and production reliability.
Familiarity with chaos engineering tools (Gremlin, Chaos Mesh) and observability stacks.
Strong understanding of cloud-native architecture and large-scale operations.
Ability to define and track reliability KPIs (MTTR, availability, performance headroom).
Ability to influence across functions and communicate effectively with senior stakeholders.

Good to Have

Experience building enablement platforms or resilience frameworks in regulated or high-availability industries such as fintech, SaaS, or gaming.
Background in observability, distributed tracing, or performance analytics.

Perks & Benefits

bonus
equity
benefits

Job Description

At DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It’s transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging technology. We’re not waiting for the future to arrive. We’re shaping it, one bold step at a time. To those who see AI as a driver of progress, come build the future together.

The Crown Is Yours

As Director of Engineering, Performance and Resilience, you'll establish and scale a new, high-impact capability that drives the performance, reliability, and operational excellence of our systems. In this role, you'll unify Performance and Chaos Engineering into one strategic function, owning the technical vision, delivery roadmap, and team culture. Your work will shape how we build, test, and operate resilient systems at scale while elevating customer experience and engineering confidence across the board.

What You'll Do

Build and lead a new Resilience Engineering team that blends performance, chaos, and reliability practices under one mission.
Define and execute a resilience strategy focused on systemic risk reduction and continuous performance improvement.
Develop and evangelize standards, tooling, and playbooks that enable engineering teams to design and operate resilient services.
Partner with Infrastructure, SRE, and Domain teams to identify high-blast-radius risks and remediate systemic weaknesses.
Integrate resilience testing into the SDLC, making fault injection and performance validation part of CI/CD.
Launch and operate a centralized Reliability Lab to support safe chaos experimentation and performance benchmarking.
Drive adoption of resilience principles through training, enablement, and measurable outcomes.
Communicate technical strategy and results to senior leadership, connecting engineering impact to customer experience and business value.

What You'll Bring

Bachelor's degree in Computer Science or any suitable combination of education, experience, and training.
At least 6 years of experience in a large-scale, growth-oriented environment, with a track record of building and leading high-performing teams.
Experience in software, platform, or reliability engineering, with at least 4 years in technical leadership.
Deep expertise in performance optimization, distributed systems scalability, and production reliability.
Familiarity with chaos engineering tools such as Gremlin and Chaos Mesh, along with observability stacks.
Strong understanding of cloud-native architecture and large-scale operations.
Demonstrated ability to define and track reliability KPIs like MTTR, availability, and performance headroom.
Ability to influence across functions and communicate effectively with senior technical and business stakeholders.
Experience building enablement platforms or resilience frameworks in regulated or high-availability industries such as fintech, SaaS, or gaming is a plus.
Background in observability, distributed tracing, or performance analytics is a plus.