Site Reliability Engineer

3 Days ago • 1-6 Years • Operations • DevOps • $98,300 PA - $208,800 PA

Job Summary

Job Description

Microsoft's Cloud+AI Silver Team seeks a Site Reliability Engineer to deploy and operate a Secure Work Area in an airgapped environment. This role involves working with engineers enabling Azure services for internal/external customers in highly secured industries, meeting stringent security requirements. Responsibilities include on-call monitoring, automation development, ensuring security and compliance, and collaborating with cross-functional teams. The ideal candidate will possess strong problem-solving skills, experience with large-scale distributed systems, and a commitment to production reliability.
Must have:
  • 4+ years experience in software/network engineering or systems administration
  • 2 years experience with large-scale distributed services and on-call responsibilities
  • Ability to meet Microsoft's security screening requirements
  • Ownership of end-to-end project lifecycle
  • Strong communication & project management skills
Good to have:
  • 2+ years experience with PowerShell, C#, or C++
  • Experience building and influencing towards common goals

Job Details

Overview

Microsoft has an exciting opportunity for a Site Reliability Engineer in the Cloud+AI Silver Team. This team will be responsible for deploying and operating a Secure Work Area, including the infrastructure for collaboration within an airgapped environment. 


In this role, you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by internal and external customers in highly secured and regulated industries. The systems and software you build will be required to meet the security policy and assurance requirements of both public and private sector customers.  
  
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required/Minimum Qualifications:

 

  • 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field.
  • 2years of experience working on large-scale distributed services with on-call responsibilities. 

 

 

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • 2+ years of experience with PowerShell, C#, or C++. 
  • Ability to build and influence broadly towards common goals and priorities. 
  • Ownership of end-to-end project lifecycle with solid project management and communication skills. 

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Microsoft will accept applications for the role until May 9, 2025.

 

#Silver

Responsibilities

The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.   


Acts as a Designated Responsible Individual (DRI) working on call to monitor service for degradation, downtime, or interruptions. Alerts stakeholders as to the status and gains approval to restore system/product/service for simple problems. Responds within Service Level Agreement (SLA) timeframe. Escalates issues to appropriate owners.

Contributes to the development of automation within production and deployment of a complex product feature. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with little to no oversight.

Contributes to efforts to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Checks for visible evidence to demonstrate compliance for product areas. Develops and holds an understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft.

Remains current in skills by investing time and effort into staying abreast of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.

Applies best practices to reliably build code that is based on well-established methods. Follows best practices for product development and scaling to customer requirements and applies best practices for meeting scaling needs and performance expectations.

Maintains communication with key partners across the Microsoft ecosystem of engineers. Considers partners across teams and their end goals for products to drive and achieve desirable user experiences and fitting the dynamic needs of partners/customers through product development.

Maintains operations of live service as issues arise on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team.

Similar Jobs

Nasdaq - Senior DevOps Engineer

Nasdaq

Bengaluru, Karnataka, India (Hybrid)
19 Hours ago
Glean - Application Security Engineer

Glean

Palo Alto, California, United States (On-Site)
7 Hours ago
Tencent - Senior Site Reliability Engineer

Tencent

Shanghai, Shanghai, China (On-Site)
7 Months ago
Zscaler - Staff Software Engineer (ML)

Zscaler

San Jose, California, United States (Hybrid)
8 Hours ago
Gigamon - Manager - Solutions Engineering

Gigamon

Chennai, Tamil Nadu, India (Hybrid)
2 Months ago
Keywords Studios - Player Engagement Operations Manager

Keywords Studios

Pasig, Metro Manila, Philippines (Hybrid)
2 Weeks ago
Warner Bros Games - People & Culture Partner

Warner Bros Games

Hyderabad, Telangana, India (On-Site)
2 Months ago
PlayStation Global - GSOC Manager

PlayStation Global

San Mateo, California, United States (On-Site)
1 Month ago
People Can Fly - Live Operations Technician

People Can Fly

Montreal, Quebec, Canada (Remote)
1 Month ago
Gaming Innovation Group  - Service Operations Analyst

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Rackspace Technology - Azure Cloud Engineer

Rackspace Technology

India (Remote)
2 Weeks ago
Zscaler - Principal Engineer (ZDX Platform)

Zscaler

San Jose, California, United States (Hybrid)
8 Hours ago
Enverus - Senior Network Engineer – Cloud/NetDevOps

Enverus

(Remote)
1 Day ago
gravitee.io - Senior Software Engineer

gravitee.io

(Hybrid)
7 Months ago
Microsoft - Principal Software Engineer - Kusto

Microsoft

(On-Site)
2 Weeks ago
Ziff Davis - Senior Full Stack Software Engineer

Ziff Davis

United States (Hybrid)
6 Months ago
Gigamon - Sr. Sales Engineer - Public Sector - Northeast

Gigamon

United States (On-Site)
3 Weeks ago
IManage - Full Stack Senior Developer (ReactJS, NodeJS)

IManage

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
N-iX - Senior AQA Engineer

N-iX

Poland (Remote)
2 Weeks ago
Rackspace Technology - Practice Manager, Data Science, AI and ML

Rackspace Technology

(Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

The Walt Disney Company - Senior Software Engineer

The Walt Disney Company

Seattle, Washington, United States (On-Site)
4 Months ago
ByteDance - Research Scientist in Computational Biology

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Google - Staff Silicon System Architect

Google

Mountain View, California, United States (On-Site)
2 Weeks ago
Elsewhere - 2D Art Generalist (Contract)

Elsewhere

San Francisco, California, United States (Remote)
3 Weeks ago
Guardian - HR Talent Partner Consultant

Guardian

New York, United States (Hybrid)
5 Hours ago
tic toe games - HR Specialist

tic toe games

Burbank, California, United States (On-Site)
1 Day ago
Blinkhealth - Pharmacist, Entry Level (Closed-Door)

Blinkhealth

Boise, Idaho, United States (On-Site)
1 Day ago
Nordson Corporation - Senior Field Service Technician

Nordson Corporation

Chandler, Arizona, United States (On-Site)
19 Hours ago
The Walt Disney Company - Youth Activities Counselor (Japanese Speaking)

The Walt Disney Company

Kapolei, Hawaii, United States (On-Site)
1 Month ago
Extreme Network - Intern - Lab Network Engineer

Extreme Network

Salem, New Hampshire, United States (On-Site)
6 Days ago

Get notifed when new similar jobs are uploaded

Operations Jobs

Tesla - Delivery Operations Manager

Tesla

Londerzeel, Flanders, Belgium (On-Site)
2 Months ago
CharacterAI - Operations Lead, Trust and Safety Operations

CharacterAI

Menlo Park, California, United States (On-Site)
1 Month ago
Evolution - Service Support Specialist

Evolution

Birkirkara, Malta (On-Site)
1 Week ago
Evolution - Brazilian Portuguese Speaking Game Presenter

Evolution

Birkirkara, Malta (On-Site)
11 Months ago
Google - Program Manager II, Compliance and Risk Management, Telecommunications

Google

Reston, Virginia, United States (On-Site)
2 Weeks ago
Tesla - Field Manager, Energy Service, Benelux

Tesla

North Holland, Netherlands (On-Site)
2 Months ago
Tencent - Senior IT Operations Engineer

Tencent

California, United States (On-Site)
1 Month ago
AGS - American Gaming Systems - Operations Manager

AGS - American Gaming Systems

United States (On-Site)
2 Weeks ago
Sphere Entertainment Co - Food & Merchandise Supervisor - Premium Suites

Sphere Entertainment Co

Las Vegas, Nevada, United States (On-Site)
2 Weeks ago
Tesla - Service Manager

Tesla

Vilnius, Vilnius County, Lithuania (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Redmond, Washington, United States (On-Site)

Redmond, Washington, United States (Hybrid)

Shanghai, Shanghai, China (Hybrid)

Beijing, Beijing, China (On-Site)

Washington, United States (On-Site)

Phoenix, Arizona, United States (On-Site)

Penang, Malaysia (On-Site)

London, England, United Kingdom (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug