Senior Cloud Site Reliability Engineer

undefined ago • 6 Years + • Devops

Job Summary

Job Description

NICE Public Safety has significantly expanded, necessitating automation, support, and 24/7 maintenance for its applications. The Site Reliability team is growing to ensure exemplary customer service. This team is responsible for reducing issues and accelerating detection/resolution through automation, tooling, telemetry, and data. The role involves technical leadership, managing production, investigating outages, and developing automation for reliability improvements.
Must have:
  • Act as part of a team of SRE’s that act as the ‘gatekeepers’ of production, and actively manage the work backlog and develop reliability improvements.
  • Lead investigations into root cause outages, performance, and cost issues.
  • Lead initiatives to develop the automation of low-value tasks balanced against project delivery demands.
  • Provide technical leadership and to wider Cloud Operations and Support teams along with providing oversight to the products and services they support.
  • Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets.
  • Develop and configure monitoring dashboards and alerts in tools like Grafana and Azure Monitor.
  • Installation and configuration of Observability Platform including tools like Grafana, Prometheus, Azure Monitor, Open telemetry etc.
  • Developing bicep modules for monitoring infrastructure and deploy it.
  • Optimize system performance, cost, and security through regular reviews and tuning.
Good to have:
  • Be flexible with working hours when needed to address critical or urgent matters.
  • Be able to provide on-call services from time to time as needed.
  • Exposure to Azure DevOps pipelines (CI/CD).
  • Exposure to test frameworks (NUnit, Jasmine, Selenium).
Perks:
  • Join an ever-growing, market disrupting, global company.
  • Work with teams comprised of the best of the best.
  • Fast-paced, collaborative, and creative environment.
  • Chance to learn and grow every day.
  • Endless internal career opportunities across multiple roles, disciplines, domains, and locations.
  • NiCE-FLEX hybrid model (2 days working from the office and 3 days of remote work, each week).

Job Details

At NiCE, we don’t limit our challenges. We challenge our limits. Always. We’re ambitious. We’re game changers. And we play to win. We set the highest standards and execute beyond them. And if you’re like us, we can offer you the ultimate career opportunity that will light a fire within you.

So, what’s the role all about?

NICE Public Safety has expanded significantly and there is a need to automate, support and maintain our applications 24/7. As a result, we are expanding our Site Reliability team to ensure we continue to offer exemplary service to our customers.

Our Site Reliability team is responsible for reducing the number of issues and speeding up the time to detection/resolution of issues using automation, tooling, telemetry, and data.

This job description is not intended to be all-inclusive, and you will also perform other reasonable related business duties as assigned by your immediate supervisor and other management as required. We may revise or change job duties as the need arises. This job description does not constitute a written or implied contract of employment

How will you make an impact?

  • Act as part of a team of SRE’s that act as the ‘gatekeepers’ of production, and actively manage the work backlog and develop reliability improvements.
  • Lead investigations into root cause outages, performance, and cost issues.
  • Lead initiatives to develop the automation of low-value tasks balanced against project delivery demands.
  • You will provide technical leadership and to wider Cloud Operations and Support teams along with providing oversight to the products and services they support.
  • Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets
  • Develop and configure monitoring dashboards and alerts in tools like Grafana and Azure Monitor.
  • Installation and configuration of Observability Platform including tools like Grafana, Prometheus, Azure Monitor, Open telemetry etc.
  • Developing bicep modules for monitoring infrastructure and deploy it.
  • Optimize system performance, cost, and security through regular reviews and tuning.

Have you got what it takes?

  • Must have 6+ years of experience in Site Reliability Engineering
  • Excellent technical, analytical and troubleshooting skills
  • Experience and in-depth knowledge of databases and data handling (MS-SQL, Elasticsearch, YML, JSON, XML)
  • Significant experience in programming or advanced scripting (Python, PowerShell, C# etc.)
  • Experience with infrastructure/configuration as code and version control (ARM, BICEP, Git)
  • Strong Experience managing monitoring, alerting and dashboarding platforms (Azure Monitor, Prometheus, Grafana, Elasticsearch)
  • Demonstrable experience of supporting live cloud services and platforms
  • Expert in developing queries for dashboards and alerting for microservices.
  • Expertise in developing custom metrics for microservices
  • Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets.
  • Production experience with Kubernetes and containerization
  • Exposure to commercial cloud providers (Ideally Azure, others considered)
  • Exposure to Azure DevOps pipelines is desirable (CI/CD)
  • Exposure to test frameworks is desirable (NUnit, Jasmine, Selenium)
  • Strong experience in infrastructure as a code, design and implementation strategies.
  • Efficient, effective, and respectful communication skills both with customers and within internal departments. Including,
  • Good listener, able to identify and validate assumptions.
  • Able to use effective questioning to confirm understanding of a customer problem and then provide help to solve it.
  • Methodical troubleshooting, technical skill and attention to detail used in diagnosing problems and reproducing issues in a local environment.
  • Multi-tasking and time-management to prioritise and switch between varied tasks.

You will have an advantage if you also have:

  • Be flexible with working hours when needed to address critical or urgent matters.
  • Be able to provide on-call services from time to time as needed.

What’s in it for you?

  • Join an ever-growing, market disrupting, global company where the teams – comprised of the best of the best – work in a fast-paced, collaborative, and creative environment! As the market leader, every day at NiCE is a chance to learn and grow, and there are endless internal career opportunities across multiple roles, disciplines, domains, and locations. If you are passionate, innovative, and excited to constantly raise the bar, you may just be our next NiCEr!

Enjoy NiCE-FLEX!

At NiCE, we work according to the NiCE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week. Naturally, office days focus on face-to-face meetings, where teamwork and collaborative thinking generate innovation, new ideas, and a vibrant, interactive atmosphere.

Requisition ID: 8231

Reporting into: Technical Manager /Director of Engineering

Role Type: Individual Contributor

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Pune, Maharashtra, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Pune, Maharashtra, India (Hybrid)

Manila, Metro Manila, Philippines (Hybrid)

Pune, Maharashtra, India (Hybrid)

Pune, Maharashtra, India (Hybrid)

Pune, Maharashtra, India (Hybrid)

Richardson, Texas, United States (Hybrid)

Southampton, England, United Kingdom (Hybrid)

Sandy, Utah, United States (Hybrid)

Sandy, Utah, United States (Hybrid)

Pune, Maharashtra, India (Hybrid)

View All Jobs

Get notified when new jobs are added by Nice

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug