Software Engineer II

1 Month ago • 2 Years + • DevOps

Job Summary

Job Description

The Software Engineer II will build and optimize infrastructure for observability and data flow in MAIA AI accelerators. This involves developing and enhancing data flows between hosts and networks, ensuring accurate insights into AI hardware operations. Responsibilities include designing efficient, scalable data flow mechanisms, collaborating with senior engineers, optimizing data flow architecture across the hardware stack, profiling and debugging data flow paths, building and maintaining infrastructure for seamless tooling interaction with MAIA chips, and contributing to internal APIs and libraries. The role requires system-level programming expertise (C/C++) and a focus on high-performance computing and hardware accelerators. The work directly impacts how developers interact with and optimize AI workloads.
Must have:
  • System-level programming (C/C++)
  • Low-level infrastructure optimization
  • Data flow management expertise
  • High-performance computing experience
  • Problem-solving skills
Good to have:
  • PCIe, eBPF, Networking understanding
  • Linux kernel and eBPF tooling familiarity
  • Performance optimization & debugging skills
  • Experience with high-impact projects
Perks:
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

The MAIA System Infrastructure team is pioneering the development of the next-generation developer ecosystem for AI Accelerators. We are at the core of creating the infrastructure that enables deep observability into our proprietary MAIA chips, empowering developers to harness the full potential of these advanced AI accelerators. Our mission is to build a transparent, efficient, and powerful ecosystem that goes beyond traditional GPU observability, providing unmatched insights into the operations and performance of our AI accelerators.

 

We operate at the intersection of cutting-edge AI hardware, system software, and developer tools, constantly pushing the boundaries of what is possible. We not only focus on the internal execution and performance metrics of the MAIA chips but also play a crucial role in optimizing the broader data flow infrastructure, particularly over PCIe, eBPF and various frontend networks, ensuring seamless and efficient data movement between the host and accelerators. By decomposing and optimizing data flow infrastructure into state-of-the-art designs, we aim to maximize the performance and efficiency of AI workloads, enhancing the overall ecosystem's capabilities. Our collaborative efforts span across hardware architects, system engineers, and AI researchers, all aimed at building a holistic observability stack that drives the next wave of AI innovation.

 

As a Software Engineer II on the MAIA System Infrastructure team, you will play a crucial role in building and optimizing the infrastructure that underpins our observability and data flow infrastructure for MAIA AI accelerators. Your primary focus will be on developing and enhancing the data flows that support our complex data flows across hosts and networks, ensuring they provide accurate and actionable insights into the complex operations of our AI hardware. This role involves working closely with senior engineers to design and implement data flow mechanisms that are efficient, scalable, and capable of handling the intricacies of our advanced accelerator architecture.

 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required Qualifications

 

  • Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • 2+ years experience in system-level programming (C/C++), with a focus on building and optimizing low-level infrastructure.

 

Other Requirements

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: 
    • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications

  • Proficient foundation in system-level programming (C/C++), with a focus on building and optimizing low-level infrastructure.
  • Experience or a keen interest in data flow management, particularly in the context of high-performance computing systems and hardware accelerators.
  • Understanding of or willingness to learn about high performance communication patterns over PCIe, eBPF, Networking, and various memory fabrics within and across our hardware and software stacks.
  • Proven problem-solving skills with the ability to tackle complex technical challenges related to data flow efficiency and infrastructure optimization.
  • A track record of working on high-impact projects, demonstrating a passion for building robust, high-performance systems.
  • Excellent collaboration and communication skills, with a drive to work alongside top-tier engineers to push the boundaries of AI acceleration tooling.
  • Familiarity with performance optimization and debugging tools is a plus, with a desire to contribute to the development of such tools.
  • Familiarity with Linux kernel and eBPF tooling (e.g., BCC, bpftrace) is a plus, demonstrating an ability to utilize eBPF for real-time data analysis and system diagnostics.

Software Engineering IC3 - The typical base pay range for this role across Canada is CAD $83,600 - CAD $159,600 per year.

 

Find additional pay information here:


Microsoft will accept applications for the role until January 20, 2024.

 

Responsibilities

In this position, you'll be hands-on in developing and optimizing the infrastructure that enables our observability and debugging tools to function seamlessly across multi-chip, multi-server environments. Your work will directly contribute to how developers interact with, analyze, and optimize AI workloads on our accelerators, ensuring that data transfer and processing are handled with maximum efficiency.

 

Foster an Inclusive and Collaborative Environment:

Actively contribute to a culture of inclusivity by valuing diverse perspectives, mentoring peers, and promoting open communication. Support and uplift teammates to ensure everyone can contribute their best in a high-performing, collaborative environment.

 

Develop and Optimize Tooling Infrastructure:

Work on the core infrastructure that supports our observability tools, focusing on the data flows and the efficient management of information between host systems and MAIA accelerators.
Implement and refine data transfer mechanisms, ensuring they are optimized for speed, reliability, and scalability across a distributed system of accelerators.

 

Contribute to Data Flow Efficiency:

Collaborate with senior engineers to decompose and optimize the data flow architecture over our entire hardware stack, focusing on minimizing latency and maximizing throughput.
Engage in profiling and debugging the data flow paths to identify and resolve bottlenecks, contributing to the overall performance of the AI infrastructure.

 

Participate in Building Robust Systems:

Assist in building and maintaining the infrastructure that allows seamless interaction between the tooling stack and the MAIA chips, ensuring reliable data collection and analysis.
Contribute to the development of internal APIs and libraries that facilitate data transfer, processing, and storage, supporting a high-performance observability ecosystem.

 

Engage with High-Performance Systems Design:

Work alongside a team of talented, inclusive and diverse engineers, gaining experience in the design and implementation of high-performance systems that are at the forefront of AI acceleration technology.
Develop a deep understanding of system-level interactions and learn to build infrastructure that supports real-time data analysis and feedback.

 

Other 

  • Embody our and
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

DISCO - Senior Software Engineer, Backend - India

DISCO

Gurugram, Haryana, India (On-Site)
3 Months ago
Reliance Industries  - Manual Do QA

Reliance Industries

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Meta - Software Engineer, Infrastructure

Meta

Foster City, California, United States (On-Site)
3 Months ago
Assystems - Développeur Full Stack Python H/F

Assystems

Courbevoie, Île-de-France, France (On-Site)
3 Months ago
ESL FACEIT Group - EFG - Site Reliability Engineer - Remote

ESL FACEIT Group - EFG

(Remote)
5 Months ago
BeBetta - DevOps Engineer

BeBetta

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Luxoft - Corporate & Syndicated Lending Principal Engineer

Luxoft

Abu Dhabi, Abu Dhabi, United Arab Emirates (On-Site)
2 Months ago
The Walt Disney Company - Lead Software Engineer (Identity)

The Walt Disney Company

Glendale, California, United States (On-Site)
2 Months ago
Microsoft - Senior AI Hardware Quality Engineer

Microsoft

Redmond, Washington, United States (On-Site)
1 Month ago
Keywords Studios (Player Support) - Software Engineer II - DevOps (On Contract)

Keywords Studios (Player Support)

Pune, Maharashtra, India (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PwC - CRM Technical -Senior associate

PwC

Mumbai, Maharashtra, India (On-Site)
4 Months ago
Netomi - Software Engineer In Test III

Netomi

Gurugram, Haryana, India (Hybrid)
3 Months ago
Trek - Senior Oracle SOA Developer

Trek

Haryana, India (On-Site)
5 Months ago
ION - AML Fullstack (Palantir)  Developer, New York (741)

ION

New York, New York, United States (Hybrid)
3 Months ago
 Sagecor Solutions - Software Engineer 1 (FST - 001)

Sagecor Solutions

Annapolis Junction, Maryland, United States (On-Site)
3 Months ago
Barracuda Networks  Inc  - Senior Software Engineer

Barracuda Networks Inc

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Playrix - Technical Director (Game Project)

Playrix

Ireland (Remote)
3 Months ago
Playrix - Senior Engineering Manager

Playrix

Ireland (Remote)
3 Months ago
Next Level Business Services - Sailpoint Developer

Next Level Business Services

San Jose, California, United States (On-Site)
3 Months ago
N-iX - SENIOR FULL-STACK (JAVA+REACT) ENGINEER (#2673)

N-iX

Poland (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Vancouver, British Columbia, Canada

Behaviour Interactive - Senior Gameplay Programmer - Dead by Daylight | Senior Programmeur·se jouabilité - Dead by Daylight

Behaviour Interactive

Montreal, Quebec, Canada (On-Site)
7 Months ago
Electronic Arts - Technical Director - Game Production Solutions

Electronic Arts

Vancouver, British Columbia, Canada (Hybrid)
5 Months ago
PwC - Accounting and Transaction Advisory Manager

PwC

Toronto, Ontario, Canada (On-Site)
4 Months ago
PwC - Data Architect Senior Manager

PwC

Toronto, Ontario, Canada (Hybrid)
4 Months ago
Evolution - Studio Support Specialist

Evolution

Burnaby, British Columbia, Canada (On-Site)
1 Month ago
Epic Games - Programmeur backend senior

Epic Games

Montreal, Quebec, Canada (On-Site)
1 Month ago
Trek - Technicien de service saisonnier

Trek

Quebec, Canada (On-Site)
1 Month ago
eBay - Senior Staff Engineer

eBay

Toronto, Ontario, Canada (Hybrid)
4 Months ago
Turbulent - Junior Character Artist

Turbulent

Montreal, Quebec, Canada (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Microsoft - Principal Software Engineer - RDMA

Microsoft

Santa Clara, California, United States (On-Site)
1 Month ago
Zeta - Sr. Site Reliability Engineer

Zeta

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Okta - Site Reliability Engineer, Kubernetes

Okta

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Brillio - .NET Azure Architect - R01525011

Brillio

Pune, Maharashtra, India (Hybrid)
3 Months ago
ION - Site Reliability Engineer

ION

Pisa, Tuscany, Italy (Hybrid)
4 Months ago
CommerceIQ - DevOps Engineer-III

CommerceIQ

Bengaluru, Karnataka, India (Hybrid)
5 Months ago
Microsoft - Principal Engineering Manager

Microsoft

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Applied Systems - Senior Systems Engineer

Applied Systems

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Consilio LLC - Infrastructure Site Reliability Engineer

Consilio LLC

Bengaluru, Karnataka, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Redmond, Washington, United States (On-Site)

Mountain View, California, United States (On-Site)

London, England, United Kingdom (Hybrid)

London, England, United Kingdom (On-Site)

Jakarta, Jakarta, Indonesia (On-Site)

Prague, Prague, Czechia (On-Site)

Montreal, Quebec, Canada (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Hyderabad, Telangana, India (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug