Senior Site Reliability Engineering Manager

1 Hour ago • 6-7 Years • Network Engineering • Research & Development • $117,200 PA - $250,200 PA

Job Summary

Job Description

The Senior Site Reliability Engineering Manager at Azure Storage will lead a team optimizing fleet availability and health for a massive, ever-evolving storage service. Responsibilities include designing, developing, and improving automation and uptime; planning and investigating complex issues; and designing scalable solutions. The role requires strong leadership in Agile/SCRUM, cross-team collaboration, and incident response. Significant impact on cost reduction and high-level visibility are key aspects. Deep understanding of distributed systems, server architecture, and troubleshooting is crucial. The position involves developing, testing, and implementing code changes to improve scalability and investigating hardware/system issues impacting capacity and customers.
Must have:
  • 6+ years technical experience
  • 4+ years Agile/SCRUM experience
  • Lead large cross-team efforts
  • Develop, test, and implement code changes
  • Investigate and solve hardware/system issues
  • Incident response and post-mortem reporting
Good to have:
  • Understanding of server architecture
  • Familiarity with distributed systems
  • Understanding of management techniques

Job Details

Overview

Are you passionate about hardware and enabling new technology? Do you enjoy complex problem solving and investigation? Azure has one of the largest storage services on the planet, holding Exabytes of data and files not just for our 3rd party customers, but also many of Microsoft’s own services. This role will focus on managing an ever growing and changing fleet at scale to maximize efficiency while providing a stable environment for our customers.  

As a Senior Site Reliability Engineering Manager in Azure Storage team you will be working with a team of engineers focused on optimizing fleet availability and health. Leading a team of engineers to design, develop and improve automation and uptimeYou will take lead of planning, investigating complex issues and designing solutions to solve problems at scale. 

This opportunity will allow you to deepen your knowledge and experience with massive distributed systems. Opportunities to have significant impact on reducing cost to the business. Exposure and visibility at VP and CVP levels.  This position is located in Redmond and has a flexible work environment that supports working from home. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required Qualifications:

  • 6+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
  • 4+ years of Agile / SCRUM planning, and leading large cross team efforts.

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • 7+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering,
  • Understanding of server architecture and the ability to debug and trouble shoot isues impacting the fleet.
  • Understadning of server componants, Firmware, BIOS and how they interact. 
  • Understanding management techinques, and methods for ensuring scope control.
  • Familiarity with distributed systems. 

 

Site Reliability Engineering M4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:


Microsoft will accept applications for the role until September 9, 2024.

 

 

#azurecorejobs

Responsibilities

  • Develop, test, and implement changes to optimize code and improve scalability. You leverage end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and automation improvments. You review the effect of changes to documents and share development insights within your team.  
  • You drive Sprint planning, SCRUM stand ups, code/design reviews, and host regular cross team / org meetings. 
  • Investigate hardware and system issues that are impacting available capacity and impacting customers. 
  • Understand the long term goals of the organization and understand the steps your team will have to take to achieve those. 
  • You respond to incidents during regular on-call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings. As a member of the team you willl be expected to help drive bridges for recovery durring major outages. 
  • Embody our  and   

Similar Jobs

GoTo Group - Senior Software Engineer - Event Platform

GoTo Group

Bengaluru, Karnataka, India (On-Site)
6 Months ago
HYCU,  Inc  - Product Marketing Manager

HYCU, Inc

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Gallagher - Data Scientist

Gallagher

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Niantic - Staff Software Engineer

Niantic

Sunnyvale, California, United States (Hybrid)
1 Week ago
Trek - Network Engineer

Trek

Haryana, India (On-Site)
2 Months ago
Google - Senior Software Engineer, Infrastructure

Google

Sunnyvale, California, United States (On-Site)
2 Days ago
Moon Active - Moon Active Talent Network

Moon Active

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Weeks ago
Google - Network Architect, Software

Google

Ann Arbor, Michigan, United States (On-Site)
1 Day ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PwC - Senior Associate_Azure Data Engineer-- Data and Analytics_Advisory_Gurugram

PwC

Gurugram, Haryana, India (On-Site)
4 Months ago
Rackspace Technology - Azure Cloud Architect

Rackspace Technology

Gurugram, Haryana, India (Remote)
4 Days ago
PwC - Dynamics 365 Manager

PwC

Makati, Metro Manila, Philippines (On-Site)
6 Months ago
Rackspace Technology - Cloud Database Engineer I/II

Rackspace Technology

Gurugram, Haryana, India (Remote)
4 Days ago
Anavation - Systems Administrator (SME)

Anavation

Clarksburg, West Virginia, United States (Remote)
2 Weeks ago
Unisys - AVD Support Senior Engineer

Unisys

Bengaluru, Karnataka, India (On-Site)
7 Months ago
Flying Bark Productions - Systems Engineer

Flying Bark Productions

New South Wales, Australia (Hybrid)
1 Month ago
Microsoft - Principal Software Engineer Manager

Microsoft

Redmond, Washington, United States (On-Site)
1 Hour ago
Microsoft - Security Software Engineer II

Microsoft

Redmond, Washington, United States (On-Site)
3 Days ago
NVIDIA - Senior Site Reliability Engineer - GPU Clusters

NVIDIA

Westford, Massachusetts, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

The Walt Disney Company - Senior Product Manager II

The Walt Disney Company

Santa Monica, California, United States (On-Site)
3 Months ago
Google - Field Sales Representative, Media and Entertainment, Google Cloud

Google

New York, New York, United States (On-Site)
2 Days ago
Next Level Business Services - UX/UI Developer

Next Level Business Services

Cincinnati, Ohio, United States (On-Site)
6 Months ago
Fluence - Sales Engineer/Senior Sales Engineer - Battery Energy Storage

Fluence

Arlington, Virginia, United States (Hybrid)
6 Months ago
ByteDance - Data Scientist Intern

ByteDance

Seattle, Washington, United States (On-Site)
5 Days ago
The Walt Disney Company - Senior Product Designer

The Walt Disney Company

Santa Monica, California, United States (On-Site)
5 Days ago
Trackman - Trackman Baseball System Operator

Trackman

Sacramento, California, United States (On-Site)
3 Weeks ago
Skydio - Flight Test Operator - Flight Core and Hardware Validation

Skydio

San Mateo, California, United States (On-Site)
8 Months ago
ByteDance - Optical system engineer - Pico Lab -(AR)- San Jose

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
ByteDance - AI/LLM Network Software Engineer (High Speed Network)

ByteDance

Seattle, Washington, United States (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

ByteDance - Tech Lead, Research Scientist, Cloud & AI computing - DPU/GPU/CPU

ByteDance

San Jose, California, United States (On-Site)
3 Weeks ago
Trek - Network Engineer

Trek

Haryana, India (On-Site)
2 Months ago
Google - Senior Network Engineer

Google

Virginia, United States (On-Site)
1 Day ago
ByteDance - Site Reliability Engineer Graduate (Technical Infrastructure) - 2025 Start (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

San Jose, California, United States (On-Site)
3 Weeks ago
Google - Data Center Technician

Google

Puyan, Changhua County, Taiwan (On-Site)
2 Days ago
DNEG - Video Streaming Engineer - Imaging, Playback and Review Tools

DNEG

London, England, United Kingdom (Remote)
17 Hours ago
Trend Micro - (Sr.) Threat Researcher

Trend Micro

Taipei City, Taiwan (On-Site)
6 Months ago
ByteDance - Global Network Commercial and Business Development Manager

ByteDance

Singapore (On-Site)
5 Months ago
ByteDance - Senior/Tech Lead Network Software Development Engineer, Switch - Seattle

ByteDance

Seattle, Washington, United States (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Redmond, Washington, United States (Hybrid)

Redmond, Washington, United States (Hybrid)

London, England, United Kingdom (On-Site)

Hyderabad, Telangana, India (On-Site)

London, England, United Kingdom (On-Site)

Redmond, Washington, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

Redmond, Washington, United States (On-Site)

London, England, United Kingdom (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug