Cloud Operations Engineer – Monitoring Lead

1 Month ago • 8 Years + • Devops • $120,000 PA - $140,000 PA

Job Summary

Job Description

Extreme is seeking a highly skilled and experienced Cloud Operations Engineer – Monitoring Lead to join their growing Cloud Operations team. This critical role involves designing, implementing, and optimizing a comprehensive monitoring and alerting strategy across cloud infrastructure and applications. The lead will drive proactive issue identification, ensure system health, and contribute to operational excellence and reliability. Responsibilities include leading the design and improvement of monitoring frameworks for cloud infrastructure (AWS, Azure, GCP), applications, and services, defining KPIs, SLIs, and SLOs, evaluating and integrating monitoring tools, and developing automation scripts. The role also requires building dashboards, analyzing data for performance bottlenecks, collaborating with engineering teams, and providing 24/7 support for Cloud services.
Must have:
  • Lead monitoring and alerting strategy
  • Define KPIs, SLIs, SLOs
  • Evaluate and integrate monitoring tools
  • Develop automation scripts
  • Build dashboards and alerts
  • Analyze monitoring data
  • Collaborate with engineering teams
  • Create documentation
  • BS technical degree
  • 8+ years in Cloud Ops/DevOps/SRE
  • Expertise in AWS, Azure, or GCP
  • Technical lead experience
  • Working knowledge of Docker, Kubernetes
  • Experience with Prometheus, Grafana, Datadog, Splunk
  • Problem-solving and analytical skills
Good to have:
  • Computer Science or Engineering background
  • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka, RabbitMQ
  • Comfortable working in distributed teams

Job Details

There has never been a better time to join Extreme, with several acquisitions extending our portfolio and go to market strategy, we have seen enormous opportunity and growth within the region.
Aside from being a Technology Leader in the Gartner Magic Quadrant, we also adamantly promote an internal culture that truly embraces diversity, inclusion, and equality in the workplace. Having Diversity and Inclusion as part of our core values and beliefs, we’re proud to foster an environment where every Extreme employee can thrive because of their differences, not despite them.
 
Cloud Operations Engineer – Monitoring Lead (Thornhill, Toronto - Hybrid)
 
We are seeking a highly skilled and experienced Cloud Operations Engineer – Monitoring Lead to join our growing Cloud Operations team. In this critical role, you will be responsible for designing, implementing, and optimizing our comprehensive monitoring and alerting strategy across our cloud infrastructure and applications. You will drive proactive identification of issues, ensure system health, and contribute significantly to our operational excellence and reliability goals. We're looking for the best and the brightest 'A' players who want to make a difference doing a job they love.

Responsibilities

    • Lead the design, implementation, and continuous improvement of our end-to-end monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP), applications, and services.
    • Define key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) for critical systems.
    • Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, CloudWatch, Azure Monitor, GCP Operations Suite) to meet evolving needs.
    • Develop and implement automation scripts and tools (e.g., Python, Bash, PowerShell) to streamline monitoring deployment, configuration, and incident remediation.
    • Build and maintain dashboards, alerts, and reports that provide actionable insights into system performance, health, and availability.
    • Analyze monitoring data to identify performance bottlenecks, resource inefficiencies, and potential cost optimization opportunities.
    • Collaborate with engineering teams to implement performance improvements and cost-saving measures.
    • Create and maintain comprehensive documentation for monitoring systems, procedures, and best practices.
    • Proactively identify areas for improvement in our cloud operations and monitoring capabilities.
    • Provide 24* 7 support for Cloud services
    • Participate in cloud security and compliance implementation.

Ideal Qualifications:

    • BS level technical degree required; Computer Science or Engineering background preferred.
    • 8+ years of progressive experience in Cloud Operations, DevOps, or Site Reliability Engineering roles, with a strong focus on monitoring.
    • Deep expertise with at least one major public cloud platform (AWS, Azure, or Google Cloud Platform).
    • Proven experience as a technical lead or senior contributor in a monitoring-focused role.
    • Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
    • Extensive experience with various monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor-specific monitoring solutions).
    • Excellent problem-solving, analytical, and troubleshooting skills.
    • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and RabbitMQ.
    • Comfortable working within a distributed team located in multiple time zones.

Similar Jobs

FICO - Senior/Lead Engineer - MLOps/DataOps/DevOps - AI Engineering

FICO

United States (Remote)
1 Year ago
Nordson Corporation - Assembler I

Nordson Corporation

Swainsboro, Georgia, United States (On-Site)
1 Month ago
Riot Games - Junior Linux Systems Engineer (Game Operation)

Riot Games

Sydney, New South Wales, Australia (Remote)
3 Months ago
CloudLinux - Senior PHP Developer with WHMCS Experience

CloudLinux

(Remote)
3 Days ago
Bosch Group - SAP S/4 HANA ABAP Frontend Developer (Fiori)

Bosch Group

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Globalization Partners - Principal Solution Architect

Globalization Partners

United States (Remote)
2 Months ago
Insight Software - Senior Solutions Engineer

Insight Software

London, England, United Kingdom (On-Site)
3 Months ago
UXBERT Labs - Senior DevOps Engineer

UXBERT Labs

Riyadh, Riyadh Province, Saudi Arabia (Hybrid)
5 Months ago
EMA - Solutions Architect

EMA

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Apple - Data Platform SRE

Apple

Austin, Texas, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Lionbridge Games - IT Technician

Lionbridge Games

Berlin, Berlin, Germany (On-Site)
3 Months ago
binance - Senior Java Engineer - Trading Tech

binance

Taipei City, Taiwan (On-Site)
2 Years ago
Magna International - Full-Stack Developer

Magna International

Bengaluru, Karnataka, India (On-Site)
9 Months ago
NCR Voyix - Software Engineer III - Java Fullstack

NCR Voyix

Chennai, Tamil Nadu, India (On-Site)
2 Months ago
Motorola solutions - Senior Cloud Operations and Kubernetes Administrator

Motorola solutions

Kraków, Lesser Poland Voivodeship, Poland (Hybrid)
1 Week ago
beghou consulting - Marketing Ops/Growth Specialist

beghou consulting

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
2K - Principal Product Manager

2K

Austin, Texas, United States (On-Site)
2 Months ago
CGS Carrers - Technical Support Analyst I

CGS Carrers

Bogota, Colombia (Remote)
2 Months ago
InnoPhase IoT - Principal PHY/MAC RTL Design Engineers

InnoPhase IoT

San Jose, California, United States (On-Site)
2 Months ago
extreme network - Real Time Embedded Software Developer - Wireless LAN

extreme network

Ontario, Canada (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Vaughan, Ontario, Canada

Blazesoft - Online Casino Program Manager

Blazesoft

Canada (On-Site)
1 Year ago
Epic Games - UI Programming Director

Epic Games

Montreal, Quebec, Canada (On-Site)
3 Months ago
Autodesk - Principal Software Engineer, Buy Experience

Autodesk

Toronto, Ontario, Canada (On-Site)
1 Month ago
Airlab Inc  - Game Artist (Mobile)

Airlab Inc

Quebec, Canada (On-Site)
3 Months ago
bounteous - IAM Reliability Engineer

bounteous

Montreal, Quebec, Canada (Hybrid)
2 Weeks ago
blast work inc - BI Data Engineer - Intermediate to Senior Level

blast work inc

Victoria, British Columbia, Canada (On-Site)
2 Months ago
Epic Games - FX Outsource Artist

Epic Games

Montreal, Quebec, Canada (On-Site)
2 Months ago
fortis games - Senior VFX Artist

fortis games

Canada (Remote)
1 Month ago
Survay Monkey - Senior Product Designer

Survay Monkey

Ottawa, Ontario, Canada (Remote)
1 Month ago
Granicus - Business Analyst

Granicus

Canada (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Synechron - Virtual Infrastructure Tooling Developer/Engineer

Synechron

Montreal, Quebec, Canada (On-Site)
3 Days ago
zoox - Principal Software Engineer, ML Infrastructure

zoox

Foster City, California, United States (Hybrid)
1 Month ago
luxsoft - Solution Architect

luxsoft

Egypt (Remote)
3 Weeks ago
Mashgin - Deployment Engineer - Georgia

Mashgin

Atlanta, Georgia, United States (Remote)
9 Months ago
Ajmera Infotech - Backend Engineer – Build fail-proof systems at global scale

Ajmera Infotech

Austin, Texas, United States (On-Site)
1 Week ago
Cadence - Solutions Engineer II

Cadence

Hsinchu, Hsinchu City, Taiwan (On-Site)
1 Week ago
JDA - Solution Architect - Manufacturing Planning

JDA

Bengaluru, Karnataka, India (On-Site)
1 Year ago
warner bros games - Staff Software Engineer - Golang - QoE Platform

warner bros games

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Devoteam - Cloud Hybride Engineer H/F

Devoteam

Levallois-Perret, Île-de-France, France (Remote)
9 Months ago
Fortra - Solutions Engineer

Fortra

Australia (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

North Carolina, United States (Remote)

Texas, United States (Remote)

Salem, New Hampshire, United States (Remote)

California, United States (Hybrid)

Tokyo, Japan (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Melbourne, Victoria, Australia (On-Site)

View All Jobs

Get notified when new jobs are added by extreme network

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug