Director, Data Center Operations - North America

Lambda

Job Summary

Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. The company's mission is to make compute as ubiquitous as electricity, providing access to artificial intelligence for everyone. Lambda is seeking a highly skilled and experienced Director of Data Center Operations to lead and support its North America data center operations. This role involves overseeing large-scale AI and high-performance computing (HPC) infrastructure, ensuring reliability, managing hardware, planning capacity, interfacing with providers, mentoring teams, and setting up new data centers to achieve world-class uptime and scalability for rapidly growing AI demands.

Must Have

  • Develop and execute North American data center operations strategy.
  • Drive continuous improvement across facility operations.
  • Lead multi-site operations team ensuring 24/7/365 reliability and SLA response.
  • Establish standardized procedures, metrics, and best practices.
  • Monitor operational KPIs: uptime, PUE, safety, and compliance.
  • Build, mentor, and scale high-performing operations teams.
  • Develop and manage operating budgets and capital expenditures.
  • Oversee strategic vendor partnerships with data center providers.
  • Ensure compliance with environmental, safety, and industry regulations.
  • Lead incident response and root cause analysis.
  • Act as primary contact for data center operations audits (SOCII, ISO).
  • 10+ years experience in data center operations, 7+ in leadership.
  • Proven experience supporting AI, HPC, or cloud infrastructure at scale.
  • Deep understanding of power, cooling, networking, capacity planning, DCIM, BMS.

Good to Have

  • Experience with GPU clusters.
  • Experience with AI infrastructure networking.
  • Experience with large-scale storage systems.
  • Familiarity with cloud-scale operational practices (AWS, Google, Microsoft).
  • Certifications like CDCDP, CDCP, PMP, or PE.

Perks & Benefits

  • Generous cash & equity compensation.
  • Health, dental, and vision coverage for you and your dependents.
  • Wellness and Commuter stipends for select roles.
  • 401k Plan with 2% company match (USA employees).
  • Flexible Paid Time Off Plan.

Job Description

What You'll Do:

As Director of Data Center Operations for North America you lead and support large-scale AI and high-performance computing (HPC) infrastructure in all of Lambda’s North America data centers. This individual will lead and oversee all aspects of data center operations — including reliability, hardware break/fix, capacity planning, provider interface, team mentorship, and new data center setup —ensuring world-class uptime, customer response, and scalability to meet rapidly growing AI infrastructure demands.

Key Responsibilities:

Strategic Leadership

  • Develop and execute the North American data center operations strategy aligned with AI infrastructure goals and organizational growth.
  • Drive continuous improvement across facility operations, emphasizing sustainability, efficiency, and resilience.
  • Partner with Engineering, Capacity Planning, and Infrastructure teams to forecast and support future AI and GPU-based compute requirements. As well as provide operational feedback on designs and system improvements.
  • Oversee expansion projects, retrofits, and site selection in collaboration with Data Center Infrastructure Engineering and HPC Architecture teams.

Operational Excellence

  • Lead a multi-site operations team ensuring 24/7/365 reliability, availability, and SLA response across all facilities.
  • Establish standardized procedures, metrics, and best practices for preventive maintenance, incident management, and service delivery.
  • Monitor operational KPIs including uptime, PUE, safety, and compliance with corporate and regulatory standards.
  • Implement automation and AI-driven monitoring solutions to optimize system performance and predictive maintenance. Coordinate and communicate data center provider maintenances with customers and impacted teams.

Team Leadership and Development

  • Build, mentor, and scale a high-performing team of operations managers, technicians, and engineers across multiple regions.
  • Routinely visit all sites to maintain standards, develop relationships, and identify areas of efficiency.
  • Foster a culture of safety, accountability, and continuous learning driving data center operations to take on more responsibility and work up the stack.
  • Assist in the build out of new data center whitespace and deployment of AI Infrastructure.

Financial and Vendor Management

  • Develop and manage operating budgets, capital expenditures, and cost-optimization initiatives.
  • Oversee strategic vendor partnerships with numerous data center providers for power, cooling, maintenance, and critical infrastructure components.

Risk and Compliance

  • Ensure compliance with environmental, safety, and industry regulations (e.g., NFPA, OSHA, ISO standards).
  • Lead incident response and root cause analysis to drive preventive improvements for incidents related to data center operations or infrastructure.
  • Act as primary point of contact for audits related to data center operations for compliance such as SOCII, ISO, etc.

Qualifications:

  • 10+ years of experience in data center operations, with at least 7 years in a leadership role managing multi-site or hyperscale facilities.
  • Proven experience supporting AI, HPC, or cloud infrastructure at scale.
  • Deep understanding of power and cooling systems, networking, capacity planning, and facility automation tools (DCIM, BMS, etc.).
  • Strong track record of improving operational efficiency and managing relationships with data center providers.
  • Preferred Bachelor’s degree in Engineering, Computer Science, or related field; Master’s bonus.
  • Exceptional communication, cross-functional collaboration, and stakeholder management skills. Ability to build relationships and consensus and positive team culture.
  • Willingness to travel (up to 50%) to data center sites across North America and data center sites under construction.

Preferred Skills:

  • Experience with GPU clusters, AI infrastructure networking, and large-scale storage systems.
  • Familiarity with cloud-scale operational practices (e.g., AWS, Google, Microsoft data center standards).
  • Certifications such as CDCDP, CDCP, PMP, or PE are a plus.

7 Skills Required For This Role

Team Management Cross Functional Game Texts Cross Functional Collaboration Networking Incident Response Aws