Overview
Event Management:
Set up and manage monitoring tools to track MS Power BI and downstream Application performance and
health.
Monitor, maintain, and optimize Power BI dashboards to ensure they are functioning correctly and efficiently.
Generate reports and provide insights performance, incidents, and improvements.
Collaborate with cross-functional teams to implement preventive measures and address emerging concerns
promptly.
Incident Management:
Respond to and manage incidents related to Power BI and associated downstream systems, investigate and
track incidents to resolution in a timely manner and within predefined SLAs.
Perform Root Cause Analysis (RCA) to the underlying causes of issues. Implement long-term solutions to
prevent recurrence of incidents.
Support and maintain Azure Data Factory pipelines, ensuring data ingestion and transformation processes run
smoothly.
Monitor and troubleshoot Databricks environments, optimizing performance and resolving any issues that
arise.
Manage and maintain UiPath automation workflows, ensuring they operate reliably and efficiently.
Execute and document post-incident summaries, root cause analysis and mitigation protocols to lessen the
likelihood of repeat incidents.
Collaboration and Communication:
Execute the communication of incidents to relevant stakeholders, relaying information on business impact,
risks, prioritization, mitigation, and estimated time to resolution.
Participate in on-call rotations to provide timely responses to production incidents and contribute to swift
issue resolution.
Change & Release Management:
Execute Service Introduction & Service Acceptance process, to validate and test the Business Application (MS
Power BI & digital products) prior to production deployment / redeployment.
Deployment of products and enhancements with minimal disruption to production systems.
Knowledge Management:
Documentation of processes, procedures, standards, and SLAs of S&T BI & Reporting Services in Service
Knowledge Management System (SKMS)
Continual Service Improvement:
Continually seek opportunities for improvement, automate repetitive tasks and reduce manual intervention.
Responsibilities
Technical Skills:
Candidate must have experience with monitoring and logging tools such as Azure Monitor, Prometheus,
Grafana, or similar.
Strong understanding of cloud platforms, particularly Microsoft Azure.
Proficiency in scripting languages such as Python, PowerShell, or Bash.
Soft Skills:
Ability to work is a fast paced, agile environment with large cross-functional teams.
Ability to manage multiple priorities at the same time.
Strong problem-solving skills and the ability to work under pressure.
Excellent interpersonal and communication skills, both written and verbal
Attention to detail and a proactive approach to identifying and resolving issues.
Qualifications
Qualification:
Degree in Computer Science, Computer Engineering, or related field preferred
Experience:
6-9 years of experience; with minimum 3+ years of experience in Site Reliability (SRE roles) / IT Application
Support role
Candidates must have strong background in supporting and managing MS Power BI Application and hands on
experience with least one of the specified technologies (Azure Data Factory, Databricks, Uipath).
Candidate must have proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
Experience in Developing and implementing automation scripts and tools to improve Application reliability
and operational efficiency.
Candidate must be willingness to be an integral part of the Production Support team, to work in UK, US shift
hours and weekend shift on rotation.
Candidate must demonstrate a willingness to learn and adapt to new technologies as needed.
Preferred Qualifications:
Certifications in Azure, Power BI, or related technologies.
Experience with CI/CD pipelines and infrastructure as code (IaC) tools.
Familiarity with ITIL practices and principles.