Director, Reliability Engineering

1 Month ago • 8-12 Years • DevOps • Manufacturing • $137,600 PA - $294,000 PA

Job Summary

Job Description

The Director, Reliability Engineering leads a team responsible for the reliability of Microsoft's cloud infrastructure hardware. This involves overseeing architecture, design, manufacturing, and operations to ensure high quality and performance. Key responsibilities include leading strategic innovations, driving root cause analysis, optimizing reliability solutions, and collaborating with cross-functional teams. The role requires strong leadership, technical expertise in reliability engineering, and experience in cloud operations. The candidate will define and manage the integration of various aspects of the hardware lifecycle to optimize cloud infrastructure reliability.
Must have:
  • Doctorate or Master's degree in relevant engineering field
  • 5+ years management experience
  • 8+ years technical engineering experience (Bachelor's)
  • Experience leading system engineering teams
  • Knowledge of cloud fleet management and diagnostics
Good to have:
  • MBA in engineering management
  • Experience with liquid cooling infrastructure
  • Experience developing design specifications
Perks:
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Microsoft Silicon, Cloud Hardware Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate, high energy engineers to help achieve that mission.

 

As Microsoft's Cloud business continues to grow the ability to deploy new offerings and HW infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Hardware, Infrastructure Management, and Fundamentals Engineering (HIFE) team is instrumental in defining and delivering operational measures of success for Cloud infrastructure reliability, improving the planning process, manufacturing, quality, delivery at scale, serviceability and sustainability. We are looking for a System Reliability Engineering Leader with a strong passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will optimize the Cloud infrastructure and its reliability.

 

We are looking for an experienced System Reliability Director who will be responsible to drive reliability performance across architecture, design, component and material selections, manufacturing and integration of datacenter hardware, ensuring that all electrical, mechanical, thermal, environmental, transportation and operational aspects along with telemetry, diagnostic and the SW/FW stack of the cloud solution are optimized throughout the lifecycle of each cloud service. The candidate will interact with Engineering, Supply Chain, Sourcing, Manufacturing & Quality, Fleet Management, Datacenter Operations, and other internal and external stakeholders.

Qualifications

Required Qualifications

  • Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience
    • OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years technical engineering experience
    • OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 8+ years technical engineering experience.
  • 5+ Years of Management including resource planning, career development and performance management.

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • MBA in engineering management or operations.
  • Experience with cloud fleet management, telemetry, diagnostic and troubleshooting of IT systems.
  • Experience and knowledge in the server industry product development process.
  • Experience in leading system engineering teams in both NPI and Sustaining lifecycles, and managing suppliers.
  • Experience and background developing design specifications and or developing product requirement documents.
  • Experience with system reliability, manufacturing process and datacenter operations, leading continuous improvements through automation
  • Experience with liquid cooling infrastructure for IT racks

Reliability Engineering M5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until January 18, 2025

 

 

#azurehwjobs   #HIFE  #Azure #Cloud #Hardware

Responsibilities

As a Director, Reliability Engineering, you will be responsible for the following:

  • Leading the Cloud System and Components Reliability Engineering organization with an ability to operate in a fast-paced environment, transforming ambiguity into clarity.
  • Leading strategic innovations and developing processes which integrate industry practices to ensure scalability and efficiency to achieve high reliability and quality performance.
  • Leading by example and coaching to inspire team members to grow and develop in the field of System and Components Reliability Engineering.
  • Leading retrospective and deep dives to drive root cause and corrective actions to prevent future escapes.
  • Combine technical and process expertise with in-depth understanding of cloud operations, to optimize reliability solutions for future server and storage products.
  • Define, facilitate and manage integration of architecture, design, manufacturing, operation, troubleshooting and diagnostic methods to optimize cloud infrastructure reliability.
  • Participate in, and approve, mechanical, thermal, electrical, telemetry & diagnostic design reviews to ensure system reliability requirements are properly implemented.
  • Drive System Reliability Readiness of new cloud platforms landing in Microsoft Datacenters.
  • Support Hardware Systems Group development, deployment and sustaining teams from system concept to decommission. Work with cross-functional strategic teams on process optimizations and inter-related strategic initiatives.
  • Develop key metrics to evaluate system reliability program’s performance and build implementation plans to confirm our performance and compliance against program metrics and internal company requirements.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Saviynt - Sr. Solutions Engineer

Saviynt

Singapore (Remote)
3 Months ago
Microsoft - Software Engineer (Full-stack)

Microsoft

Taipei City, Taiwan (On-Site)
1 Month ago
Microsoft - Mechanical Engineer - Data Center Operations

Microsoft

Hyderabad, Telangana, India (On-Site)
1 Month ago
Anthology  Inc  - Associate Software Engineer II

Anthology Inc

Bengaluru, Karnataka, India (On-Site)
2 Months ago
ARHS - IT Support Officer

ARHS

Amsterdam, North Holland, Netherlands (On-Site)
3 Months ago
Microsoft - Software Engineering - Azure Networking Control Plane

Microsoft

Santa Clara, California, United States (On-Site)
1 Month ago
EXUSIA - Google Cloud Platform Architect/Developer

EXUSIA

Pune, Maharashtra, India (On-Site)
4 Months ago
The Walt Disney Company - Manager, Software Engineering - Ads Data Infrastructure and Devops

The Walt Disney Company

Santa Monica, California, United States (On-Site)
2 Months ago
NetSPI - Lead DevOps Engineer

NetSPI

Pune, Maharashtra, India (On-Site)
4 Months ago
Wargaming - DevOps Engineer (Platform Team)

Wargaming

Vilnius, Vilnius County, Lithuania (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PwC - IFS-Operate-Data Analytics- Associate-Kolkata

PwC

Kolkata, West Bengal, India (On-Site)
4 Months ago
Microsoft - Senior Software Engineer (Full Stack) - M365

Microsoft

Hyderabad, Telangana, India (On-Site)
1 Month ago
Microsoft - Senior AI Architect

Microsoft

Mountain View, California, United States (On-Site)
1 Month ago
Trek - .NET Engineer

Trek

Haryana, India (On-Site)
5 Months ago
version 1 - .Net Technical Lead

version 1

Belfast, Northern Ireland, United Kingdom (On-Site)
1 Month ago
Luxoft - Lead Software Solution Architect

Luxoft

Poland, Ohio, United States (Remote)
2 Months ago
Microsoft - Senior Network Engineer

Microsoft

(Hybrid)
1 Month ago
Saviynt - Sr. Engineer, Solutions Engineering

Saviynt

United States (Remote)
3 Months ago
PwC - IN_Associate_Azure Cloud Data Engineer_OneCloud _Advisory _Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

PlayStation Global - Senior Software Engineer (Rust, C++)

PlayStation Global

Aliso Viejo, California, United States (On-Site)
2 Months ago
New York Times - Senior Analyst, Data and Insights, Games

New York Times

New York, New York, United States (Hybrid)
1 Month ago
ByteDance - Infrastructure Engineering TPM - PMO

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
 Sagecor Solutions - System Administrator 2 (FST - 002)

Sagecor Solutions

Maryland, United States (On-Site)
3 Months ago
WebMD - Clinical Strategy Associate Director

WebMD

Newark, New Jersey, United States (On-Site)
2 Months ago
Playtech - Accountant

Playtech

Bensalem, Pennsylvania, United States (On-Site)
1 Month ago
Thatgamecompany - Senior Multiplayer Engineer

Thatgamecompany

United States (Remote)
8 Months ago
Industrial Color - Designer 2D/3D Luxury Beauty - Industrial Color Extended

Industrial Color

New York, New York, United States (Hybrid)
5 Months ago
Netflix - Workplace Coordinator

Netflix

Los Angeles, California, United States (On-Site)
1 Month ago
VX Media - Creative Strategist

VX Media

New York, New York, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Google - ISV Specialist Partner Engineer IV, Data Management

Google

Los Angeles, California, United States (On-Site)
1 Month ago
PwC - IN_Senior Associate _Infrastructure Engineer_OneCloud_Advisory_Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Ziff Davis - Site Reliability Engineer II

Ziff Davis

Helsinki, Uusimaa, Finland (On-Site)
3 Months ago
Visa - Staff Systems Engineer - GO

Visa

Singapore, Singapore (On-Site)
3 Months ago
Visa - Chief Systems Architect

Visa

Auckland, Auckland, New Zealand (Hybrid)
1 Month ago
Rockstar Games - Senior DevOps Engineer

Rockstar Games

North Carolina, United States (On-Site)
1 Month ago
Ajmera Infotech - Senior Azure DevOps Engineer (IaaS)

Ajmera Infotech

Ahmedabad, Gujarat, India (On-Site)
7 Months ago
Avathon - Senior IT Cloud Engineer

Avathon

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Super - Software Engineering Intern - Infrastructure (DevOps)

Super

Toronto, Ontario, Canada (Remote)
1 Month ago
Illumina - IT Engineer- Data Protection

Illumina

Bengaluru, Karnataka, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Redmond, Washington, United States (On-Site)

Mountain View, California, United States (On-Site)

London, England, United Kingdom (Hybrid)

London, England, United Kingdom (On-Site)

Jakarta, Jakarta, Indonesia (On-Site)

Prague, Prague, Czechia (On-Site)

Montreal, Quebec, Canada (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Hyderabad, Telangana, India (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug