Senior Site Reliability Engineering Manager

1 Month ago • 6-7 Years • Network Engineering • Research & Development • $117,200 PA - $250,200 PA

Job Summary

Job Description

The Senior Site Reliability Engineering Manager at Azure Storage will lead a team optimizing fleet availability and health for a massive, ever-evolving storage service. Responsibilities include designing, developing, and improving automation and uptime; planning and investigating complex issues; and designing scalable solutions. The role requires strong leadership in Agile/SCRUM, cross-team collaboration, and incident response. Significant impact on cost reduction and high-level visibility are key aspects. Deep understanding of distributed systems, server architecture, and troubleshooting is crucial. The position involves developing, testing, and implementing code changes to improve scalability and investigating hardware/system issues impacting capacity and customers.
Must have:
  • 6+ years technical experience
  • 4+ years Agile/SCRUM experience
  • Lead large cross-team efforts
  • Develop, test, and implement code changes
  • Investigate and solve hardware/system issues
  • Incident response and post-mortem reporting
Good to have:
  • Understanding of server architecture
  • Familiarity with distributed systems
  • Understanding of management techniques

Job Details

Overview

Are you passionate about hardware and enabling new technology? Do you enjoy complex problem solving and investigation? Azure has one of the largest storage services on the planet, holding Exabytes of data and files not just for our 3rd party customers, but also many of Microsoft’s own services. This role will focus on managing an ever growing and changing fleet at scale to maximize efficiency while providing a stable environment for our customers.  

As a Senior Site Reliability Engineering Manager in Azure Storage team you will be working with a team of engineers focused on optimizing fleet availability and health. Leading a team of engineers to design, develop and improve automation and uptimeYou will take lead of planning, investigating complex issues and designing solutions to solve problems at scale. 

This opportunity will allow you to deepen your knowledge and experience with massive distributed systems. Opportunities to have significant impact on reducing cost to the business. Exposure and visibility at VP and CVP levels.  This position is located in Redmond and has a flexible work environment that supports working from home. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required Qualifications:

  • 6+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
  • 4+ years of Agile / SCRUM planning, and leading large cross team efforts.

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • 7+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering,
  • Understanding of server architecture and the ability to debug and trouble shoot isues impacting the fleet.
  • Understadning of server componants, Firmware, BIOS and how they interact. 
  • Understanding management techinques, and methods for ensuring scope control.
  • Familiarity with distributed systems. 

 

Site Reliability Engineering M4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:


Microsoft will accept applications for the role until September 9, 2024.

 

 

#azurecorejobs

Responsibilities

  • Develop, test, and implement changes to optimize code and improve scalability. You leverage end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and automation improvments. You review the effect of changes to documents and share development insights within your team.  
  • You drive Sprint planning, SCRUM stand ups, code/design reviews, and host regular cross team / org meetings. 
  • Investigate hardware and system issues that are impacting available capacity and impacting customers. 
  • Understand the long term goals of the organization and understand the steps your team will have to take to achieve those. 
  • You respond to incidents during regular on-call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings. As a member of the team you willl be expected to help drive bridges for recovery durring major outages. 
  • Embody our  and   

Similar Jobs

N-iX - Junior Automation QA Engineer (Python)

N-iX

Colombia (Remote)
1 Month ago
Luxoft - BI Developer (SSIS and SSAS)

Luxoft

Gurugram, Haryana, India (On-Site)
5 Months ago
ByteDance - Technical Program Manager, Public Cloud

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Lighthouse Games - Senior SDET - C++

Lighthouse Games

England, United Kingdom (Hybrid)
2 Months ago
ByteDance - Tech Lead - Data Tech Infrastructure- San Jose

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Tencent - Tencent Cloud - Senior Cloud Network Engineer

Tencent

(On-Site)
2 Months ago
ByteDance - Senior Software Developer, Routing Verification & Emulation

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
ByteDance - Senior/Tech Lead AI/LLM Network Software Development Engineer

ByteDance

San Jose, California, United States (On-Site)
4 Months ago
ByteDance - Experienced Software Engineer - Traffic Platform

ByteDance

San Jose, California, United States (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Arkadium - Data Analyst

Arkadium

Lisbon, Lisbon, Portugal (Hybrid)
2 Months ago
Velotio Technologies - Cloud Security Engineer

Velotio Technologies

Maharashtra, India (Remote)
2 Months ago
PwC - IN-Manager _Technical Delivery Manager_ Emerging Technologies_ Advisory_ Bengaluru

PwC

Bengaluru, Karnataka, India (On-Site)
7 Months ago
The Walt Disney Company - Manager, Database Reliability Engineering

The Walt Disney Company

California, United States (On-Site)
1 Month ago
N-iX - AI Engineer

N-iX

Poland (Remote)
2 Months ago
ByteDance - Senior Software Engineer - IaaS AI Infra

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
PwC - Consultant Architecte Cloud et Intégration | CDI | H/F

PwC

Toulouse, Occitanie, France (On-Site)
7 Months ago
ByteDance - Product Manager - Infrastructure Platform

ByteDance

Singapore (On-Site)
5 Months ago
Glean - Solutions Engineer - East

Glean

(Remote)
6 Months ago
ION - Cloud Engineer Kubernetes

ION

Milan, Lombardy, Italy (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Google - Business Continuity and Resilience Manager

Google

Chicago, Illinois, United States (On-Site)
1 Month ago
Google - Software Engineer III, Mobile (iOS)

Google

Mountain View, California, United States (On-Site)
1 Month ago
NVIDIA - Senior Mixed Design Validation Systems - Electrical/Optical Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
2 Months ago
Activision - Expert Character Artist - Heads

Activision

Santa Monica, California, United States (On-Site)
1 Month ago
The Walt Disney Company - Manager, Database Reliability Engineering

The Walt Disney Company

Washington, United States (On-Site)
2 Months ago
PlayStation Global - Senior Gameplay Capture Artist

PlayStation Global

California, United States (Remote)
5 Months ago
Skillz - People Operations Co-op

Skillz

Las Vegas, Nevada, United States (On-Site)
2 Months ago
ByteDance - Senior Technical Lead - Edge Cloud Infrastructure - San Jose / Seattle / Boston

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Rockstar Games - Senior Data Engineer

Rockstar Games

New York, New York, United States (On-Site)
2 Months ago
The Walt Disney Company - Senior QA Analyst

The Walt Disney Company

Glendale, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

ByteDance - Security System Engineer

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
ByteDance - Tech Lead - Architect / Researcher - DPU

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Meta - Network Production Engineer

Meta

Menlo Park, California, United States (On-Site)
6 Months ago
NVIDIA - Senior Network Algorithms Architect

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
ByteDance - Site Reliability Engineer, Edge Services

ByteDance

Seattle, Washington, United States (On-Site)
1 Month ago
ByteDance - Backend Software Engineer

ByteDance

San Jose, California, United States (On-Site)
1 Month ago
Playtika - IT Infrastructure Engineer

Playtika

Ukraine (On-Site)
6 Months ago
ARHS - Systems Engineer

ARHS

Valletta, Malta (On-Site)
7 Months ago
NVIDIA - Network Site Reliability Engineer

NVIDIA

Santa Clara, California, United States (Hybrid)
1 Month ago
Google - Production Network Engineer

Google

Dublin, County Dublin, Ireland (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Vancouver, British Columbia, Canada (On-Site)

Mountain View, California, United States (Hybrid)

Shenzhen, Guangdong Province, China (On-Site)

Noida, Uttar Pradesh, India (On-Site)

Redmond, Washington, United States (On-Site)

Paris, Île-de-France, France (On-Site)

London, England, United Kingdom (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug