Site Reliability Engineer

1 Hour ago • All levels • DevOps • Undisclosed

About the job

Job Description

The Site Reliability Engineer (SRE) at Microsoft's Azure Core team will maintain the world's computer, ensuring new servers come online efficiently at hyperscale. Responsibilities involve collaborating with various teams (developers, hardware engineers, datacenter technicians, etc.) to debug and resolve issues, drive continuous improvements, and prevent future problems. This role requires analyzing data to identify problem areas, automating mitigations, and participating in design reviews and problem management. The ideal candidate will have a foundational understanding of distributed systems and experience with programming languages (C, C++, C#, Java). The role involves working with large-scale server and network device management, investigation, and root cause analysis across multiple systems.
Must have:
  • Technical experience in software engineering, network engineering, or systems administration.
  • Distributed systems experience
  • Programming skills (C, C++, C#, Java)
  • Root cause analysis and problem resolution
  • Collaboration with multiple teams
Perks:
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Overview

Come build and maintain the world’s computer as a member of the Microsoft Capacity Infrastructure Services team in Azure Core. The team ensures new servers are brought online (capacity buildout) to enable Azure customers to leverage the latest offerings, see the illusion of infinite capacity, and grow the Azure business efficiently at hyperscale.

As a Site Reliability Engineer, you’ll work with a breadth of partners across Microsoft including developers in service teams, hardware engineers, datacenter technicians, supply chain managers, and business leaders to rapidly debug and resolve issues delaying this carefully orchestrated buildout sequence. You’ll drive continuous improvements with these teams to prevent repeats and address common classes of issues across the Azure software stack through design reviews and problem management.

This opportunity will enable you to learn unparalleled system-wide knowledge of how the Azure cloud is built and maintained. The contacts you make with experts will enable you to deep dive on services and new technologies and partner for improvements. You’ll be stretched to automate mitigations tactically and strategically analyze data to identify problem areas for driving prioritization. This role requires flexibility to hold virtual meetings and collaborate with partners worldwide. It supports remote work up to 100% of the time working from home.

 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications: 

  • Technical experience in software engineering, network engineering, or systems administration.
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field.
  • You must be legally authorised to work in Romania to be eligible for this role (Legallly authorised= has citizenship or has been granted a valid visa or work permit).

 

***Relocation expenses are not provided as part of this role

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Additional / Preferred Qualifications: 

  • Distributed systems - developing, debugging, monitoring, and deploying.
  • Programming - C, C++, C#, Java.
  • Systems - hardware and software interface, host and networking, large scale server and network device management, investigation and root cause analysis across multiple systems/services/teams.

 

#Azurecorejobs

Responsibilities

  • Develops a foundational understanding of distributed systems design, interactions between cloud technology layers and components, basic dependencies at scale, and the code that defines infrastructures. Can contribute to the code base the defines components or features of systems or cloud technologies to improve the reliability and operability of supported products, with direction with other engineers.
  • Supports ongoing engagements with product engineering teams by participating in code/design reviews, regular meetings, on-call rotations, and incident responses throughout product development and operations cycles; draws insights from engagements with product engineering teams and basic analyses of telemetry data to propose potential improvements to code and designs for a defined set of product components or features with guidance from other engineers.
  • Implements simple configuration and data changes across a predefined range of product components or features with guidance from other engineers to develop an understanding of how configurations, binaries, and data can be managed using code, tooling, and automation.
  • Develops an understanding of how to safely and reliably manage changes in production by using existing tools and automation to enable product engineering teams implement changes across a defined range of components or features, with direction from other engineers.
  • Uses existing tools to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components or features with guidance from other engineers. Suggests potential solutions to resolve and prevent recurring issues and brings them to the attention of other engineers or team leads.
  • Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting basic issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams or owners to major customer impacting issues and escalates the resolution of complex issues and/or those affecting multiple components or features to other engineers as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect
View Full Job Description

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Al Asimah Governate, Kuwait (On-Site)

Al Asimah Governate, Kuwait (On-Site)

Beijing, Beijing, China (On-Site)

Redmond, Washington, United States (On-Site)

Mountain View, California, United States (Remote)

Redmond, Washington, United States (Hybrid)

Dublin, County Dublin, Ireland (On-Site)

New York, New York, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Similar Jobs

Microsoft - MSBuild Engineering Manager

Microsoft, Czechia (On-Site)

Barbaricum - SOF Intelligence Field Service Engineer

Barbaricum, Jordan (On-Site)

Bally's Interactive - Software Engineer - Mid Level

Bally's Interactive, Estonia (On-Site)

Luxoft - Lead DevOps Engineer

Luxoft, (Remote)

Netflix - Media Systems Engineer

Netflix, (On-Site)

Carry1st - Senior DevOps Engineer

Carry1st, South Africa (Remote)

Millennium - Site Reliability Engineer

Millennium, India (On-Site)

OpenGov - Director, Infrastructure Engineering

OpenGov, United States (Hybrid)

Vimeo - Sr. DevOps Engineer

Vimeo, Israel (On-Site)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Keywords Studios (Player Support) - Software Development Engineer in Test II

Keywords Studios (Player Support), India (Hybrid)

Picarro - Sr. QA Engineer

Picarro, India (Hybrid)

Nagarro - Staff Engineer, QA Automation

Nagarro, Philippines (Remote)

Barbaricum - Intelligence Operations Integrator

Barbaricum, United States (On-Site)

Rush Street Interactive - Senior Server Engineer

Rush Street Interactive, Estonia (On-Site)

Playrix - Lead SDET

Playrix, Montenegro (Remote)

Meta - Software Engineer, Infrastructure

Meta, United States (Remote)

Get notifed when new similar jobs are uploaded

Jobs in Bucharest, Bucharest, Romania

Luxoft - Sophis Ops Engineer

Luxoft, Romania (On-Site)

Amazon Games - Senior Environment Artist, Amazon Games

Amazon Games, Romania (Hybrid)

Trendyol - Insights Professional

Trendyol, Romania (Hybrid)

Every matrix - Middle Frontend Developer (JavaScript)

Every matrix, Romania (Hybrid)

Ness Digital - Senior NOC Incident Manager

Ness Digital, Romania (Hybrid)

Global Step - Senior 3D Animator

Global Step, Romania (Remote)

ASSIST Software - Tech Artist - Spine Animator

ASSIST Software, Romania (Remote)

Every matrix - Legal Counsel

Every matrix, Romania (Hybrid)

Every matrix - Junior FrontEnd Developer

Every matrix, Romania (Hybrid)

Every matrix - Application Security Engineer

Every matrix, Romania (Hybrid)

Get notifed when new similar jobs are uploaded

DevOps Jobs

The Walt Disney Company - Manager, Systems Reliability Engineering

The Walt Disney Company, United States (On-Site)

The Walt Disney Company - Senior Systems Engineer

The Walt Disney Company, United States (On-Site)

PwC - Cloud DevSecOps Architect

PwC, Canada (Hybrid)

Warner Bros Discovery - Staff Software Engineer

Warner Bros Discovery, India (On-Site)

Sumo Logic - Senior Site Reliability Engineer - Core

Sumo Logic, India (On-Site)

Activision - Associate Dev Support Engineer

Activision, United Kingdom (Hybrid)

Microsoft - Data Engineer II

Microsoft, India (On-Site)

Get notifed when new similar jobs are uploaded