Associate Manager - SRE

10 Months ago • 6-9 Years

Devops

Job Description

The Associate Manager - SRE is responsible for managing and monitoring MS Power BI and associated systems. Responsibilities include event management (setting up monitoring tools, optimizing dashboards, generating reports), incident management (responding to incidents, performing RCA, implementing solutions), collaboration (communicating with stakeholders, participating in on-call rotations), change & release management (executing service introduction, deploying products), knowledge management (documenting processes), and continual service improvement. The role requires experience with Azure, Power BI, Azure Data Factory, Databricks, UiPath, scripting languages (Python, PowerShell, Bash), and monitoring tools (Azure Monitor, Prometheus, Grafana).

Good To Have:

Azure certifications
CI/CD experience
ITIL familiarity

Must Have:

Experience with Power BI & Azure
Incident & Event Management
Scripting (Python, PowerShell)
Monitoring tools (Azure Monitor)
RCA & problem-solving
Collaboration & Communication

Add these skills to join the top 1% applicants for this job

bash

ci-cd

microsoft-azure

grafana

azure

swift

python

powershell

power-bi

prometheus

release-management

cross-functional

About the job

Overview

Event Management:

 Set up and manage monitoring tools to track MS Power BI and downstream Application performance and

health.

 Monitor, maintain, and optimize Power BI dashboards to ensure they are functioning correctly and efficiently.

 Generate reports and provide insights performance, incidents, and improvements.

 Collaborate with cross-functional teams to implement preventive measures and address emerging concerns

promptly.

Incident Management:

 Respond to and manage incidents related to Power BI and associated downstream systems, investigate and

track incidents to resolution in a timely manner and within predefined SLAs.

 Perform Root Cause Analysis (RCA) to the underlying causes of issues. Implement long-term solutions to

prevent recurrence of incidents.

 Support and maintain Azure Data Factory pipelines, ensuring data ingestion and transformation processes run

smoothly.

 Monitor and troubleshoot Databricks environments, optimizing performance and resolving any issues that

arise.

 Manage and maintain UiPath automation workflows, ensuring they operate reliably and efficiently.

 Execute and document post-incident summaries, root cause analysis and mitigation protocols to lessen the

likelihood of repeat incidents.

Collaboration and Communication:

 Execute the communication of incidents to relevant stakeholders, relaying information on business impact,

risks, prioritization, mitigation, and estimated time to resolution.

 Participate in on-call rotations to provide timely responses to production incidents and contribute to swift

issue resolution.

Change & Release Management:

 Execute Service Introduction & Service Acceptance process, to validate and test the Business Application (MS

Power BI & digital products) prior to production deployment / redeployment.

 Deployment of products and enhancements with minimal disruption to production systems.

Knowledge Management:

 Documentation of processes, procedures, standards, and SLAs of S&T BI & Reporting Services in Service

Knowledge Management System (SKMS)

Continual Service Improvement:

 Continually seek opportunities for improvement, automate repetitive tasks and reduce manual intervention.

Responsibilities

Technical Skills:

 Candidate must have experience with monitoring and logging tools such as Azure Monitor, Prometheus,

Grafana, or similar.

 Strong understanding of cloud platforms, particularly Microsoft Azure.

 Proficiency in scripting languages such as Python, PowerShell, or Bash.

Soft Skills:

 Ability to work is a fast paced, agile environment with large cross-functional teams.

 Ability to manage multiple priorities at the same time.

 Strong problem-solving skills and the ability to work under pressure.

 Excellent interpersonal and communication skills, both written and verbal

 Attention to detail and a proactive approach to identifying and resolving issues.

Qualifications

Qualification:

 Degree in Computer Science, Computer Engineering, or related field preferred

Experience:

 6-9 years of experience; with minimum 3+ years of experience in Site Reliability (SRE roles) / IT Application

Support role

 Candidates must have strong background in supporting and managing MS Power BI Application and hands on

experience with least one of the specified technologies (Azure Data Factory, Databricks, Uipath).

 Candidate must have proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.

 Experience in Developing and implementing automation scripts and tools to improve Application reliability

and operational efficiency.

 Candidate must be willingness to be an integral part of the Production Support team, to work in UK, US shift

hours and weekend shift on rotation.

 Candidate must demonstrate a willingness to learn and adapt to new technologies as needed.

Preferred Qualifications:

 Certifications in Azure, Power BI, or related technologies.

 Experience with CI/CD pipelines and infrastructure as code (IaC) tools.

 Familiarity with ITIL practices and principles.