Senior Site Reliability Engineering Manager

52 Minutes ago • 6-7 Years • Network Engineering

About the job

Job Description

The Senior Site Reliability Engineering Manager at Azure Storage will lead a team optimizing fleet availability and health for one of the world's largest storage services. Responsibilities include designing, developing, and improving automation and uptime; investigating complex issues at scale; and planning solutions to maximize efficiency. This role requires strong leadership in Agile/SCRUM, incident response, and cross-team collaboration. Significant impact on cost reduction and high-level visibility are key aspects. The position involves developing, testing, and implementing code changes for scalability, troubleshooting hardware and system issues, and understanding long-term organizational goals. The role includes on-call rotations and post-mortem reporting.
Must have:
  • 6+ years experience in relevant field
  • 4+ years in Agile/SCRUM leadership
  • Expertise in distributed systems
  • Problem-solving and investigation skills
  • Develop, test, and implement code changes
  • Incident response and post-mortem reporting
Good to have:
  • Understanding of server architecture
  • Familiarity with server components, firmware, BIOS
  • Understanding management techniques and scope control
Perks:
  • Industry-leading healthcare
  • Educational resources
  • Product and service discounts
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Overview

Are you passionate about hardware and enabling new technology? Do you enjoy complex problem solving and investigation? Azure has one of the largest storage services on the planet, holding Exabytes of data and files not just for our 3rd party customers, but also many of Microsoft’s own services. This role will focus on managing an ever growing and changing fleet at scale to maximize efficiency while providing a stable environment for our customers.  

As a Senior Site Reliability Engineering Manager in Azure Storage team you will be working with a team of engineers focused on optimizing fleet availability and health. Leading a team of engineers to design, develop and improve automation and uptimeYou will take lead of planning, investigating complex issues and designing solutions to solve problems at scale. 

This opportunity will allow you to deepen your knowledge and experience with massive distributed systems. Opportunities to have significant impact on reducing cost to the business. Exposure and visibility at VP and CVP levels.  This position is located in Redmond and has a flexible work environment that supports working from home. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required Qualifications:

  • 6+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
  • 4+ years of Agile / SCRUM planning, and leading large cross team efforts.

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • 7+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering,
  • Understanding of server architecture and the ability to debug and trouble shoot isues impacting the fleet.
  • Understadning of server componants, Firmware, BIOS and how they interact. 
  • Understanding management techinques, and methods for ensuring scope control.
  • Familiarity with distributed systems. 

 

Site Reliability Engineering M4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:


Microsoft will accept applications for the role until September 9, 2024.

 

 

#azurecorejobs

Responsibilities

  • Develop, test, and implement changes to optimize code and improve scalability. You leverage end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and automation improvments. You review the effect of changes to documents and share development insights within your team.  
  • You drive Sprint planning, SCRUM stand ups, code/design reviews, and host regular cross team / org meetings. 
  • Investigate hardware and system issues that are impacting available capacity and impacting customers. 
  • Understand the long term goals of the organization and understand the steps your team will have to take to achieve those. 
  • You respond to incidents during regular on-call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings. As a member of the team you willl be expected to help drive bridges for recovery durring major outages. 
  • Embody our  and   
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect
View Full Job Description
$117.2K - $250.2K/yr (Outscal est.)
$183.7K/yr avg.
Redmond, Washington, United States

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Washington, United States (Hybrid)

Redmond, Washington, United States (Hybrid)

London, England, United Kingdom (On-Site)

Mountain View, California, United States (On-Site)

São Paulo, State Of São Paulo, Brazil (On-Site)

Redmond, Washington, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Redmond, Washington, United States (On-Site)

Hyderabad, Telangana, India (On-Site)

San José, San José Province, Costa Rica (Remote)

View All Jobs

Get notified when new jobs are added by Microsoft

Similar Jobs

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Lulalend - Senior Mobile Engineer

Lulalend, South Africa (Remote)

RadiusAI - Lead Software Engineer - Devops

RadiusAI, India (On-Site)

Intel Corporation - Data Architect

Intel Corporation, United States (Hybrid)

Zones - Cloud Technical Specialist

Zones, Pakistan (On-Site)

PwC - Senior Back-End Developer (C#)

PwC, Czechia (On-Site)

Bluevine - Senior Product Manager

Bluevine, India (Hybrid)

PeopleFun - Senior Server Engineer II, Wordscapes

PeopleFun, United States (Remote)

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Rockstar Games - Senior DevOps Engineer

Rockstar Games, United States (On-Site)

Fanatics - Visual Designer, Global Brand Team

Fanatics, United States (Hybrid)

Mattel  Inc  - Consumer Services Coordinator

Mattel Inc , United States (On-Site)

Mashgin - Senior Software Engineer, Infrastructure

Mashgin, United States (Hybrid)

Framestore - Freelance: Animator - New York

Framestore, United States (Hybrid)

The Walt Disney Company - Sr Software Engineer, iOS

The Walt Disney Company, United States (On-Site)

Netflix - Sr. Technical Game Designer, Games Studio

Netflix, United States (On-Site)

Sony Pictures Animation - Production Manager - Features

Sony Pictures Animation, United States (On-Site)

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

Meta - Software Engineer - Datacenter networking

Meta, United States (On-Site)

Extreme Network - Systems Engineer-Scandinavia

Extreme Network, Sweden (Remote)

Google - Network Engineer, Public Sector

Google, United States (On-Site)

ByteDance - Experienced Software Engineer - Traffic Platform

ByteDance, United States (On-Site)

Bally's Interactive - Senior Network Engineer

Bally's Interactive, United States (On-Site)

PlayStation Global - Network Operations Engineer

PlayStation Global, Australia (On-Site)

Meta - Network Production Engineer

Meta, United States (On-Site)

Get notifed when new similar jobs are uploaded