Site Reliability Engineer

1 Month ago • 1-6 Years • Operations • DevOps • $98,300 PA - $208,800 PA

Job Summary

Job Description

Microsoft's Cloud+AI Silver Team seeks a Site Reliability Engineer to deploy and operate a Secure Work Area in an airgapped environment. This role involves working with engineers enabling Azure services for internal/external customers in highly secured industries, meeting stringent security requirements. Responsibilities include on-call monitoring, automation development, ensuring security and compliance, and collaborating with cross-functional teams. The ideal candidate will possess strong problem-solving skills, experience with large-scale distributed systems, and a commitment to production reliability.
Must have:
  • 4+ years experience in software/network engineering or systems administration
  • 2 years experience with large-scale distributed services and on-call responsibilities
  • Ability to meet Microsoft's security screening requirements
  • Ownership of end-to-end project lifecycle
  • Strong communication & project management skills
Good to have:
  • 2+ years experience with PowerShell, C#, or C++
  • Experience building and influencing towards common goals

Job Details

Overview

Microsoft has an exciting opportunity for a Site Reliability Engineer in the Cloud+AI Silver Team. This team will be responsible for deploying and operating a Secure Work Area, including the infrastructure for collaboration within an airgapped environment. 


In this role, you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by internal and external customers in highly secured and regulated industries. The systems and software you build will be required to meet the security policy and assurance requirements of both public and private sector customers.  
  
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required/Minimum Qualifications:

 

  • 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field.
  • 2years of experience working on large-scale distributed services with on-call responsibilities. 

 

 

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • 2+ years of experience with PowerShell, C#, or C++. 
  • Ability to build and influence broadly towards common goals and priorities. 
  • Ownership of end-to-end project lifecycle with solid project management and communication skills. 

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until May 9, 2025.

 

#Silver

Responsibilities

The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.   


Acts as a Designated Responsible Individual (DRI) working on call to monitor service for degradation, downtime, or interruptions. Alerts stakeholders as to the status and gains approval to restore system/product/service for simple problems. Responds within Service Level Agreement (SLA) timeframe. Escalates issues to appropriate owners.

Contributes to the development of automation within production and deployment of a complex product feature. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with little to no oversight.

Contributes to efforts to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Checks for visible evidence to demonstrate compliance for product areas. Develops and holds an understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft.

Remains current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.

Applies best practices to reliably build code that is based on well-established methods. Follows best practices for product development and scaling to customer requirements and applies best practices for meeting scaling needs and performance expectations.

Maintains communication with key partners across the Microsoft ecosystem of engineers. Considers partners across teams and their end goals for products to drive and achieve desirable user experiences and fitting the dynamic needs of partners/customers through product development.

Maintains operations of live service as issues arise on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team.

Similar Jobs

Wind River - Product Manager - DevSecOps

Wind River

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
ByteDance - Research Engineer / Scientist - AI for Databases

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
Zscaler - Senior Backend Engineer, Data Fabric (Avalor)

Zscaler

Ramat Gan, Tel Aviv District, Israel (Hybrid)
1 Month ago
Falcon X - Senior Cloud Security Engineer

Falcon X

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Mettler-Toledo International,  Inc  - Software Engineer Test – Senior

Mettler-Toledo International, Inc

Karnataka, India (Hybrid)
7 Months ago
Outscal - Product Operations (Gaming)

Outscal

Delhi, India (On-Site)
5 Months ago
Tencent - Product Operations Intern

Tencent

(On-Site)
2 Months ago
Google - Training Program Manager, Design and Delivery

Google

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Tesla - Shift Supervisor Operations Paint

Tesla

Brandenburg, Germany (On-Site)
3 Months ago
NVIDIA - Senior Technical Program Manager - GPU Clusters

NVIDIA

California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Dayforce - Test Automation Engineer Sr

Dayforce

Bengaluru, Karnataka, India (Remote)
10 Months ago
WebMD - Lead, Data Engineering

WebMD

Newark, New Jersey, United States (On-Site)
7 Months ago
Enverus - Senior Site Reliability Engineer

Enverus

Brno, South Moravian Region, Czechia (Hybrid)
1 Month ago
GoReel - DevOps Lead

GoReel

Romania (Remote)
2 Months ago
Sporty Group - LatAM Site Reliability Engineer

Sporty Group

(On-Site)
1 Year ago
Next Level Business Services - Java Script Developer (Sr UI Developer with very Strong Exp in Java Script )

Next Level Business Services

Dallas, Texas, United States (On-Site)
7 Months ago
PwC - Microsoft Senior M365 Consultant (m/f/d)

PwC

Luxembourg (On-Site)
7 Months ago
Saviynt - Sr.Principal Engineer, Software Engineering

Saviynt

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Crowd Strick - Regional Sales Engineer

Crowd Strick

(Remote)
1 Month ago
Workato - Senior AI/ML Engineer

Workato

Hyderabad, Telangana, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Nissan - Warehouse Operator - Memphis

Nissan

Memphis, Tennessee, United States (On-Site)
8 Months ago
Google - Data Analytics Sales Specialist III

Google

Washington, District Of Columbia, United States (On-Site)
1 Month ago
Epic Games - Senior Engine Programmer, Framework Architecture

Epic Games

Cary, North Carolina, United States (On-Site)
4 Months ago
Naughty Dog - Senior Gameplay Programmer

Naughty Dog

Santa Monica, California, United States (On-Site)
1 Month ago
ByteDance - Senior/Tech Lead Network Software Development Engineer, Switch - Seattle

ByteDance

Seattle, Washington, United States (On-Site)
6 Months ago
Kavalirio - Product Review Liaison Engineer IV

Kavalirio

San Antonio, Texas, United States (On-Site)
1 Month ago
Interactive Brokers - Security Engineer

Interactive Brokers

Greenwich, Connecticut, United States (Hybrid)
1 Month ago
NVIDIA - Senior Manager, Internal Audit and SOX

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
GameJobs - Senior Data Scientist (Full Stack)

GameJobs

Austin, Texas, United States (On-Site)
1 Year ago
ByteDance - Senior Site Reliability Engineer - Applied Machine Learning

ByteDance

San Jose, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Operations Jobs

Rank group - Team Leader - Food & Beverage

Rank group

Dundee, Scotland, United Kingdom (On-Site)
5 Months ago
Next Level Business Services - Techno Functional OTBI

Next Level Business Services

Pleasanton, California, United States (On-Site)
7 Months ago
PlayStation Global - GSOC Manager

PlayStation Global

San Mateo, California, United States (On-Site)
2 Months ago
The Walt Disney Company - Media Delivery Ops Specialist

The Walt Disney Company

Amsterdam, North Holland, Netherlands (On-Site)
1 Month ago
Crunchyroll - Customer Experience Operations Analyst

Crunchyroll

Culver City, California, United States (On-Site)
4 Months ago
Tesla - Service Advisor

Tesla

Bavaria, Germany (On-Site)
3 Months ago
Tesla - Service Advisor

Tesla

North Brabant, Netherlands (On-Site)
3 Months ago
The Walt Disney Company - Costume Assistant - 12month contract (HKD$6000 Special Welcome Reward)

The Walt Disney Company

Hong Kong (On-Site)
7 Months ago
Eleven Labs - AI Safety Operations

Eleven Labs

United Kingdom (Remote)
2 Months ago
Sphere Entertainment Co - Guest Services Call Center Operator (Part-Time)

Sphere Entertainment Co

Las Vegas, Nevada, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Vancouver, British Columbia, Canada (On-Site)

Mountain View, California, United States (Hybrid)

Shenzhen, Guangdong Province, China (On-Site)

Noida, Uttar Pradesh, India (On-Site)

Redmond, Washington, United States (On-Site)

Paris, Île-de-France, France (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug