Jobs Courses Resources Companies Placements

Home >

Jobs >

Principal Supercomputing Software Engineer

Microsoft

Multiple Locations, United States (On-site)

Principal Supercomputing Software Engineer

6 Months ago • 6 Years + • Devops • $137,600 PA - $294,000 PA

Job Summary

Job Description

Microsoft Azure AI/HPC team seeks a Principal Supercomputing Software Engineer to build and utilize cutting-edge tools for managing hyperscale cloud infrastructure. Responsibilities involve analyzing system metrics, debugging HPC issues, developing solutions for operating supercomputers in the public cloud, collaborating with customers and vendors, and ensuring platform performance, scalability, and resilience. The role demands expertise in AI/HPC system management, high-speed networks, HPC storage, or cloud infrastructure management. The engineer will contribute to establishing best practices, driving architectural changes, and influencing roadmaps for software and hardware components.

Must have:

6+ years technical engineering experience
5+ years experience operating AI/HPC systems
3+ years specialized experience in AI/HPC system management, high-speed networks, HPC storage, or cloud infrastructure
Proficient in C, C++, C#, Java, JavaScript, or Python
Strong analytical and problem-solving skills

Good to have:

Master's or PhD in Computer Science
Experience running large-scale HPC systems in cloud environments
Experience troubleshooting machine learning workloads on GPU-based HPC systems
Expertise in cloud computing, virtualization, and container technologies
Familiarity with the HPC software stack

Perks:

Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Networking opportunities

7 skills required

7 skills required for this role

Add these skills to join the top 1% applicants for this job

java

javascript

microsoft-azure

azure

python

innovation

problem-solving

Job Details

Overview

Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is looking for systems engineers, architects and thought leaders to enable customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure is enabling the largest supercomputing deployments to tackle complex computational problems in public cloud, evident from the various HPC products that have already made the mark on Top500, MLPerf and Graph500 rankings.

At this supercomputing scale, we need specialized tools and techniques to maintain the reliability, runtime performance, health of the system and running jobs continuing to meet the Service Level Agreements (SLAs) of customers. Your job would be to build and use state-of-the-art tools and techniques, find operational gaps and instrument features to achieve the smooth operation of cloud-native supercomputers. As a Principal Supercomputing Engineer, you would also bring to the table establishing best practices drive architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact business goals of a wide range of users and facilitate the next wave of growth and innovation in AI, and HPC in the cloud in general.  

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications:

Bachelor's Degree in Computer Science or related technical or scientific field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience
5+ years of experience in operating AI/HPC systems, developing and running AI/HPC applications on clusters, or operating Cloud Infrastructure
3+ years of specialized experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

Masters' Degree or PhD in Computer Science or related technical or scientific field
Operational experience running large scale HPC systems or infrastructure situated in Cloud environments
Previous experience with running and troubleshooting machine learning workloads on GPU-based HPC systems
Expertise in Cloud Computing, Virtualization and Container Technologies
Familiarity with the HPC software stack

Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until January 6, 2025.

#azurecorejobs

Responsibilities

Be part of a comprehensive systems management team focused on operational excellence and customer success
Analyze key system metrics and telemetry to proactively identify and debug HPC system issues, build appropriate tooling, help develop processes and ensure that solutions are responsive to emerging user needs
Partner with customers, vendors, and other teams within Azure to drive comprehensive solutions for operating world class Supercomputers in the public cloud environment
Ensure that the Azure platform is performant, scalable and resilient
Foster test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

Industry leading healthcare

Educational resources

Discounts on products and services

Savings and investments

Maternity and paternity leave

Generous time away

Giving programs

Opportunities to network and connect

Similar Jobs

Software Engineer, Front End, Google Cloud

Google

(On-Site)

• 6 Months ago

Full Stack Developer (Python + React) (freelance)

PwC

Warsaw, Masovian Voivodeship, Poland (Hybrid)

• 9 Months ago

Senior Software Engineer

Microsoft

Vancouver, British Columbia, Canada (On-Site)

• 6 Months ago

Software Engineer, Infrastructure

Data Engineer (L5)

Netflix

United States (Remote)

• 8 Months ago

Senior Software Developer (Cloud Infrastructure)

Autodesk

Vancouver, British Columbia, Canada (On-Site)

• 9 Months ago

Software Architect - Privilege Access Management

Saviynt

United States (Remote)

• 8 Months ago

Senior/Lead DevOps Engineer

Luxoft

(Remote)

• 6 Months ago

Infrastructure Engineer - Viostream

Banyan Software

Chennai, Tamil Nadu, India (On-Site)

• 9 Months ago

Sr. Data Engineer - (Big Data, Spark, Scala, Python, AWS, RDBMS, SQL) (copy)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)

• 8 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Backend Engineer

Sinch

Skåne County, Sweden (Hybrid)

• 6 Months ago

Software Engineer

Java Developer

Playtika

Ukraine (On-Site)

• 8 Months ago

Software Developer Coop

Warner Bros Discovery

Ottawa, Ontario, Canada (On-Site)

• 6 Months ago

QA Engineer

Zinrelo

Pune, Maharashtra, India (Hybrid)

• 8 Months ago

Software Engineer Graduate (Multi Cloud CDN) - 2025 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)

• 8 Months ago

Senior Staff Software Engineer, Mobile

Paypal

San Jose, California, United States (Hybrid)

• 8 Months ago

Senior Software Engineer, Full Stack, Labs

Google

Mountain View, California, United States (On-Site)

• 6 Months ago

Quality Assurance / QA Specialist - Gaming (f/m/x)

Bytro

Hamburg, Hamburg, Germany (Hybrid)

• 11 Months ago

Lead Machine Learning Engineer, Ad Platforms

The Walt Disney Company

Santa Monica, California, United States (On-Site)

• 7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in undefined

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Senior Cloud/DevOps Architect

ARHS

Luxembourg (On-Site)

• 8 Months ago

Senior Cloud SRE

NICE

Pune, Maharashtra, India (Hybrid)

• 8 Months ago

DevOps Engineering Lead

Luxoft

Pune, Maharashtra, India (On-Site)

• 7 Months ago

Azure DevOps Senior Consultant

bosh group india

Bengaluru, Karnataka, India (On-Site)

• 7 Months ago

Staff Software Engineer, Networking Infrastructure, Google Cloud

Google

(On-Site)

• 7 Months ago

Lead DevOps Engineer

Luxoft

Bengaluru, Karnataka, India (On-Site)

• 7 Months ago

Lead Data Reliability Engineer

Zeta

Hyderabad, Telangana, India (On-Site)

• 8 Months ago

Sr. Data SRE

Netskope

Bengaluru, Karnataka, India (Remote)

• 8 Months ago

Senior Computer Scientist

Adobe

Noida, Uttar Pradesh, India (On-Site)

• 8 Months ago

DevOps - Lead

Fractal

Mumbai, Maharashtra, India (On-Site)

• 8 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft

56 Active Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Principal Supercomputing Software Engineer

Job Summary

Job Description

7 skills required

7 skills required for this role

Job Details

Overview

Qualifications

Responsibilities

Similar Jobs

Software Engineer, Front End, Google Cloud

Full Stack Developer (Python + React) (freelance)

Senior Software Engineer

Software Engineer, Infrastructure

Data Engineer (L5)

Senior Software Developer (Cloud Infrastructure)

Software Architect - Privilege Access Management

Senior/Lead DevOps Engineer

Infrastructure Engineer - Viostream

Sr. Data Engineer - (Big Data, Spark, Scala, Python, AWS, RDBMS, SQL) (copy)

Similar Skill Jobs

Backend Engineer

Software Engineer

Java Developer

Software Developer Coop

QA Engineer

Software Engineer Graduate (Multi Cloud CDN) - 2025 Start (BS/MS)

Senior Staff Software Engineer, Mobile

Senior Software Engineer, Full Stack, Labs

Quality Assurance / QA Specialist - Gaming (f/m/x)

Lead Machine Learning Engineer, Ad Platforms

Jobs in undefined

Looks like we're out of matches

Devops Jobs

Senior Cloud/DevOps Architect

Senior Cloud SRE

DevOps Engineering Lead

Azure DevOps Senior Consultant

Staff Software Engineer, Networking Infrastructure, Google Cloud

Lead DevOps Engineer

Lead Data Reliability Engineer

Sr. Data SRE

Senior Computer Scientist

DevOps - Lead

About The Company

Privacy

Member of Technical Staff, Machine Learning Engineer

Technical Support Engineer (Data and AI Intelligent Platform)

Engine Programmer

Service Engineer II

Software Engineer - Agent Team - Microsoft Identity

Technical Support Engineer - Security & Compliance

Member of Technical Staff, AI Pretraining Platform

Azure Infrastructure Specialist Manager

Senior Applied Researcher

Level Up Your Career in Game Development!