Principal Supercomputing Software Engineer

1 Month ago • 6 Years + • DevOps • $137,600 PA - $294,000 PA

Job Summary

Job Description

Microsoft Azure AI/HPC team seeks a Principal Supercomputing Software Engineer to build and utilize cutting-edge tools for managing hyperscale cloud infrastructure. Responsibilities involve analyzing system metrics, debugging HPC issues, developing solutions for operating supercomputers in the public cloud, collaborating with customers and vendors, and ensuring platform performance, scalability, and resilience. The role demands expertise in AI/HPC system management, high-speed networks, HPC storage, or cloud infrastructure management. The engineer will contribute to establishing best practices, driving architectural changes, and influencing roadmaps for software and hardware components.
Must have:
  • 6+ years technical engineering experience
  • 5+ years experience operating AI/HPC systems
  • 3+ years specialized experience in AI/HPC system management, high-speed networks, HPC storage, or cloud infrastructure
  • Proficient in C, C++, C#, Java, JavaScript, or Python
  • Strong analytical and problem-solving skills
Good to have:
  • Master's or PhD in Computer Science
  • Experience running large-scale HPC systems in cloud environments
  • Experience troubleshooting machine learning workloads on GPU-based HPC systems
  • Expertise in cloud computing, virtualization, and container technologies
  • Familiarity with the HPC software stack
Perks:
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is looking for systems engineers, architects and thought leaders to enable customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure is enabling the largest supercomputing deployments to tackle complex computational problems in public cloud, evident from the various HPC products that have already made the mark on Top500, MLPerf and Graph500 rankings.

 

At this supercomputing scale, we need specialized tools and techniques to maintain the reliability, runtime performance, health of the system and running jobs continuing to meet the Service Level Agreements (SLAs) of customers. Your job would be to build and use state-of-the-art tools and techniques, find operational gaps and instrument features to achieve the smooth operation of cloud-native supercomputers. As a Principal Supercomputing Engineer, you would also bring to the table establishing best practices drive architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact business goals of a wide range of users and facilitate the next wave of growth and innovation in AI, and HPC in the cloud in general.



 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications:

  • Bachelor's Degree in Computer Science or related technical or scientific field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience
  • 5+ years of experience in operating AI/HPC systems, developing and running AI/HPC applications on clusters, or operating Cloud Infrastructure
  • 3+ years of specialized experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications: 

  • Masters' Degree or PhD in Computer Science or related technical or scientific field
  • Operational experience running large scale HPC systems or infrastructure situated in Cloud environments
  • Previous experience with running and troubleshooting machine learning workloads on GPU-based HPC systems
  • Expertise in Cloud Computing, Virtualization and Container Technologies
  • Familiarity with the HPC software stack

Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until January 6, 2025.

 

 

#azurecorejobs

Responsibilities

  • Be part of a comprehensive systems management team focused on operational excellence and customer success
  • Analyze key system metrics and telemetry to proactively identify and debug HPC system issues, build appropriate tooling, help develop processes and ensure that solutions are responsive to emerging user needs
  • Partner with customers, vendors, and other teams within Azure to drive comprehensive solutions for operating world class Supercomputers in the public cloud environment
  • Ensure that the Azure platform is performant, scalable and resilient
  • Foster test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Google - Software Engineer, Front End, Google Cloud

Google

(On-Site)
2 Months ago
PwC - Full Stack Developer (Python + React) (freelance)

PwC

Warsaw, Masovian Voivodeship, Poland (Hybrid)
4 Months ago
Microsoft - Senior Software Engineer

Microsoft

Vancouver, British Columbia, Canada (On-Site)
1 Month ago
Meta - Software Engineer, Infrastructure

Meta

London, England, United Kingdom (On-Site)
3 Months ago
Netflix - Data Engineer (L5)

Netflix

United States (Remote)
3 Months ago
Autodesk - Senior Software Developer (Cloud Infrastructure)

Autodesk

Vancouver, British Columbia, Canada (On-Site)
4 Months ago
Saviynt - Software Architect - Privilege Access Management

Saviynt

United States (Remote)
4 Months ago
Luxoft - Senior/Lead DevOps Engineer

Luxoft

(Remote)
2 Months ago
Banyan Software - Infrastructure Engineer - Viostream

Banyan Software

Chennai, Tamil Nadu, India (On-Site)
4 Months ago
Nielsen Holdings - Sr. Data Engineer - (Big Data, Spark, Scala, Python, AWS, RDBMS, SQL) (copy)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Sinch - Backend Engineer

Sinch

Skåne County, Sweden (Hybrid)
2 Months ago
Meta - Software Engineer

Meta

Bellevue, Washington, United States (On-Site)
3 Months ago
Playtika - Java Developer

Playtika

Ukraine (On-Site)
3 Months ago
Warner Bros Discovery - Software Developer Coop

Warner Bros Discovery

Ottawa, Ontario, Canada (On-Site)
2 Months ago
Zinrelo - QA Engineer

Zinrelo

Pune, Maharashtra, India (Hybrid)
4 Months ago
ByteDance - Software Engineer Graduate (Multi Cloud CDN) - 2025 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
Paypal - Senior Staff Software Engineer, Mobile

Paypal

San Jose, California, United States (Hybrid)
4 Months ago
Google - Senior Software Engineer, Full Stack, Labs

Google

Mountain View, California, United States (On-Site)
2 Months ago
Bytro - Quality Assurance / QA Specialist - Gaming (f/m/x)

Bytro

Hamburg, Hamburg, Germany (Hybrid)
6 Months ago
The Walt Disney Company - Lead Machine Learning Engineer, Ad Platforms

The Walt Disney Company

Santa Monica, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in undefined

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

DevOps Jobs

ARHS - Senior Cloud/DevOps Architect

ARHS

Luxembourg (On-Site)
4 Months ago
NICE - Senior Cloud SRE

NICE

Pune, Maharashtra, India (Hybrid)
4 Months ago
Luxoft - DevOps Engineering Lead

Luxoft

Pune, Maharashtra, India (On-Site)
3 Months ago
bosh group india - Azure DevOps Senior Consultant

bosh group india

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Luxoft - Lead DevOps Engineer

Luxoft

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Zeta - Lead Data Reliability Engineer

Zeta

Hyderabad, Telangana, India (On-Site)
4 Months ago
Netskope - Sr. Data SRE

Netskope

Bengaluru, Karnataka, India (Remote)
4 Months ago
Adobe - Senior Computer Scientist

Adobe

Noida, Uttar Pradesh, India (On-Site)
3 Months ago
Fractal - DevOps - Lead

Fractal

Mumbai, Maharashtra, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

New York, New York, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

London, England, United Kingdom (On-Site)

Dublin, County Dublin, Ireland (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug