Senior AI Hardware Quality Engineer

1 Month ago • 5-12 Years • DevOps • $117,200 PA - $250,200 PA

Job Summary

Job Description

Microsoft seeks a Senior AI Hardware Quality Engineer to develop and implement a robust supplier quality management strategy for data center hardware. Responsibilities include leading quality issue resolution task forces, conducting debug and failure analysis for GPU subsystems, driving continuous improvement via RCA, establishing key performance metrics, and acting as the voice of quality in hardware change management. The role requires expertise in root cause analysis, data analysis, and communication, along with experience in managing manufacturing quality in the electronics industry and resolving hardware system issues for GPU servers. The ideal candidate will have a strong understanding of modern server architectures, including GPU and CPU failure analysis and debugging.
Must have:
  • 5+ years managing manufacturing quality in electronics
  • 5+ years resolving HW system issues for GPU servers
  • 5+ years debugging data to identify HW failure signatures
  • Root cause analysis and corrective action expertise
  • Data analysis and communication skills
Good to have:
  • Patent or track record of engineering excellence
  • Experience with modern server architectures (GPU, CPU)
  • System-level server debugging experience
  • Direct GPU-related engineering experience
  • Leadership and collaboration skills
Perks:
  • Industry-leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, and delivering a trusted experience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission.

 

As Microsoft's cloud business continues to grow the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the Hardware, Infrastructure Management, and Fundamentals Engineering (HIFE) team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery, scale and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions, insight and industry knowledge to envision and implement future technical solutions that will manage and optimize the Cloud infrastructure.

 

At Microsoft, we immensely value culture, mentorship, and acting as One Team. This is an opportunity to leverage your technical expertise by working in a highly collaborative environment to discover, define, and deliver storage innovation at Cloud-scale. 

 

We are looking for a Senior AI Hardware Quality Engineer to join the team.

 

Qualifications

Required Qualifications:

 

  • Master's Degree in Electrical Engineering, Computer Engineering, or related field AND 3+ years technical engineering experience

    o OR Bachelor's Degree in Electrical Engineering, Computer Engineering, or related field AND 5+ years technical engineering experience

    o OR equivalent experience.

  • 5+ years of work experience in managing manufacturing quality in the electronic industry. 
  • 5+ years of direct engineering experience in hardware system issue resolution for GPU Servers. 
  • 5+ years debugging data, i.e. telemetry and logs to identify and investigate HW failure signatures.   

 

Other Requireements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • Bachelor's Degree in Electrical Engineering, Computer Engineering,or related field AND 7+ years experience in technical engineering 
    • OR 9+ years equivalent experience.
  • Patent or track record of engineering excellency.
  • 12+ years of experience in working with the modern server architectures – includes understanding of GPU, CPU methods for failure analysis, debugging or validation.
  • 8+ years of system level server debugging with an understanding of platform, power, system and network environments
  • 3+ years of direct GPU related engineering experience in issue debug/test log review. 
  • Leadership skills and ability to collaborate with diverse teams and drive a call to action. 
  • Expert of root cause analysis and corrective action methods to identify contributing factors of production defects. 
  • Ability to analyze large data sets, extract key insights, and effectively present and communicate the results.
  • Proficient communication and project management skills. 


Electrical Engineering IC4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year. Certain roles may be eligible for benefits and other compensation.

Find additional benefits and pay information here:

 

Microsoft will accept applications for the role until January 29, 2025.

 

 

#azurehwjobs   #HIFE

Responsibilities

  • Develop and implement a robust supplier quality management strategy to ensure the data center hardware is manufactured at the highest level of quality standards. 
  • Lead quality issues and improvement task force to contain, mitigate, and resolve the top-quality issues impacting global data centers. 
  • Conduct debug and failure analysis for GPU subsystems in the Azure fleet and drive resolution with partners and suppliers.
  • Drive the continuous improvement process based on Root Cause Analysis (RCA) and identified opportunities. 
  • Responsible for quality readouts based on your telemetry data analysis, to bring clarity on status, actions across the organization and next steps for issue resolution.
  • Establish Critical-to-Quality performance metrics to measure and improve product quality. 
  • Act as the voice of quality in the hardware change management process, ensuring quality requirements are considered and met and improved. 
  • Embody our and  
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Microsoft - Business Administrator

Microsoft

Bengaluru, Karnataka, India (On-Site)
1 Month ago
CloudHire - Microsoft /Inquoto Sales Specialist

CloudHire

Houston, Texas, United States (On-Site)
4 Months ago
Fluxon - Staff Software Engineer

Fluxon

Hyderabad, Telangana, India (Remote)
4 Months ago
Microsoft - Principal Software Engineer

Microsoft

Santa Clara, California, United States (On-Site)
1 Month ago
LSEG (London Stock Exchange Group) - Technical Design Authority

LSEG (London Stock Exchange Group)

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Microsoft - Software Engineer II

Microsoft

(Hybrid)
1 Month ago
GoTo Group - Software Engineer - Foundation Security

GoTo Group

Bengaluru, Karnataka, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Enphase Energy - EVSE - Staff Engineer

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
2 Months ago
PwC - IN_Manager_Delivery Manager_Data & Analytics_Advisory_Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Virtuos - Global Senior System Administrator

Virtuos

France (On-Site)
4 Months ago
Ajmera Infotech - ASP.NET Developer with Azure Expertise

Ajmera Infotech

San Jose, California, United States (On-Site)
5 Months ago
Microsoft - Digital Enterprise Specialist - Data & AI (German Speaking)

Microsoft

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago
PwC - D365 Azure Integration Developer-Manager

PwC

Kolkata, West Bengal, India (On-Site)
4 Months ago
Microsoft - Senior Analog Design Engineer

Microsoft

Redmond, Washington, United States (On-Site)
1 Month ago
PhonePe - Site Reliability Engineer-NetOps

PhonePe

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Zones - Cloud Engineer

Zones

Mumbai, Maharashtra, India (On-Site)
2 Months ago
Microsoft - Software Engineer II

Microsoft

Redmond, Washington, United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

ION - Senior Technical Business Analyst, New York

ION

New York, New York, United States (Hybrid)
2 Months ago
The Walt Disney Company - Sr Machine Learning Engineer

The Walt Disney Company

San Francisco, California, United States (On-Site)
3 Months ago
Sony Pictures Animation - Art Director - CG Series

Sony Pictures Animation

Culver City, California, United States (On-Site)
1 Month ago
Games For Love - Volunteer GFL Outreach Manager

Games For Love

Lynnwood, Washington, United States (Remote)
6 Months ago
Hawk Eye Innovations - UFL Studio Technician

Hawk Eye Innovations

Los Angeles, California, United States (On-Site)
1 Month ago
Maersk Careers - Process Manager, Operations (PMO)

Maersk Careers

Los Angeles, California, United States (On-Site)
1 Month ago
Power Integrations - Analog IC Design Engineer

Power Integrations

San Jose, California, United States (On-Site)
4 Months ago
Blinkhealth - Pharmacy Prior Authorization Specialist (ON SITE)

Blinkhealth

St. Louis, Missouri, United States (On-Site)
1 Month ago
Pixar Animation Studios - Build & Release Engineer

Pixar Animation Studios

Emeryville, California, United States (Hybrid)
1 Month ago
Visa - Sr. Data Scientist, Risk and Identity Solutions

Visa

Atlanta, Georgia, United States (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

EY - DET-TT-NGTO-TBM Manager-GDSN02

EY

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Microsoft - Principal Software Engineer

Microsoft

Prague, Prague, Czechia (On-Site)
1 Month ago
Netflix - Distributed Systems Engineer (L5) - Compute Abstractions

Netflix

United States (Remote)
2 Months ago
Journee - Senior Cloud Infrastructure Engineer

Journee

Berlin, Berlin, Germany (Hybrid)
4 Months ago
Rocket - Senior Systems Engineer

Rocket

Vilnius, Vilnius County, Lithuania (Hybrid)
5 Years ago
Futurum Technology  - DevOps Engineer

Futurum Technology

Poland (On-Site)
8 Months ago
PwC - IN-Manager_D365 Azure Integration Developer_MS Dynamics– Advisory  - Kolkata

PwC

Kolkata, West Bengal, India (On-Site)
4 Months ago
SSC Technologies - Principal SRE

SSC Technologies

New York, New York, United States (On-Site)
4 Months ago
Playtech - Product Operations Team Leader

Playtech

Kyiv, Kyiv City, Ukraine (On-Site)
4 Months ago
Xactly Corp - Senior Cloud Infrastructure Engineer

Xactly Corp

Bengaluru, Karnataka, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

New York, New York, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

London, England, United Kingdom (On-Site)

Dublin, County Dublin, Ireland (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug