Cloud Operations Engineer – Monitoring Lead

2 Months ago • 8 Years + • Devops • $120,000 PA - $140,000 PA

Job Summary

Job Description

Extreme is seeking a highly skilled and experienced Cloud Operations Engineer – Monitoring Lead to join their growing Cloud Operations team. This critical role involves designing, implementing, and optimizing a comprehensive monitoring and alerting strategy across cloud infrastructure and applications. The lead will drive proactive issue identification, ensure system health, and contribute to operational excellence and reliability. Responsibilities include leading the design and improvement of monitoring frameworks for cloud infrastructure (AWS, Azure, GCP), applications, and services, defining KPIs, SLIs, and SLOs, evaluating and integrating monitoring tools, and developing automation scripts. The role also requires building dashboards, analyzing data for performance bottlenecks, collaborating with engineering teams, and providing 24/7 support for Cloud services.
Must have:
  • Lead monitoring and alerting strategy
  • Define KPIs, SLIs, SLOs
  • Evaluate and integrate monitoring tools
  • Develop automation scripts
  • Build dashboards and alerts
  • Analyze monitoring data
  • Collaborate with engineering teams
  • Create documentation
  • BS technical degree
  • 8+ years in Cloud Ops/DevOps/SRE
  • Expertise in AWS, Azure, or GCP
  • Technical lead experience
  • Working knowledge of Docker, Kubernetes
  • Experience with Prometheus, Grafana, Datadog, Splunk
  • Problem-solving and analytical skills
Good to have:
  • Computer Science or Engineering background
  • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka, RabbitMQ
  • Comfortable working in distributed teams

Job Details

There has never been a better time to join Extreme, with several acquisitions extending our portfolio and go to market strategy, we have seen enormous opportunity and growth within the region.
Aside from being a Technology Leader in the Gartner Magic Quadrant, we also adamantly promote an internal culture that truly embraces diversity, inclusion, and equality in the workplace. Having Diversity and Inclusion as part of our core values and beliefs, we’re proud to foster an environment where every Extreme employee can thrive because of their differences, not despite them.
 
Cloud Operations Engineer – Monitoring Lead (Thornhill, Toronto - Hybrid)
 
We are seeking a highly skilled and experienced Cloud Operations Engineer – Monitoring Lead to join our growing Cloud Operations team. In this critical role, you will be responsible for designing, implementing, and optimizing our comprehensive monitoring and alerting strategy across our cloud infrastructure and applications. You will drive proactive identification of issues, ensure system health, and contribute significantly to our operational excellence and reliability goals. We're looking for the best and the brightest 'A' players who want to make a difference doing a job they love.

Responsibilities

    • Lead the design, implementation, and continuous improvement of our end-to-end monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP), applications, and services.
    • Define key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) for critical systems.
    • Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, CloudWatch, Azure Monitor, GCP Operations Suite) to meet evolving needs.
    • Develop and implement automation scripts and tools (e.g., Python, Bash, PowerShell) to streamline monitoring deployment, configuration, and incident remediation.
    • Build and maintain dashboards, alerts, and reports that provide actionable insights into system performance, health, and availability.
    • Analyze monitoring data to identify performance bottlenecks, resource inefficiencies, and potential cost optimization opportunities.
    • Collaborate with engineering teams to implement performance improvements and cost-saving measures.
    • Create and maintain comprehensive documentation for monitoring systems, procedures, and best practices.
    • Proactively identify areas for improvement in our cloud operations and monitoring capabilities.
    • Provide 24* 7 support for Cloud services
    • Participate in cloud security and compliance implementation.

Ideal Qualifications:

    • BS level technical degree required; Computer Science or Engineering background preferred.
    • 8+ years of progressive experience in Cloud Operations, DevOps, or Site Reliability Engineering roles, with a strong focus on monitoring.
    • Deep expertise with at least one major public cloud platform (AWS, Azure, or Google Cloud Platform).
    • Proven experience as a technical lead or senior contributor in a monitoring-focused role.
    • Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
    • Extensive experience with various monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor-specific monitoring solutions).
    • Excellent problem-solving, analytical, and troubleshooting skills.
    • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and RabbitMQ.
    • Comfortable working within a distributed team located in multiple time zones.

Similar Jobs

Activision - Senior Online Programmer

Activision

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Months ago
Veeam Software - Backend Engineer, SaaS platform

Veeam Software

Prague, Czechia (On-Site)
3 Months ago
Ion - Senior Software Engineer - C

Ion

Noida, Uttar Pradesh, India (On-Site)
10 Months ago
luxsoft - Senior Technical Support Analyst

luxsoft

Gurugram, India (On-Site)
2 Months ago
Globalization Partners - Content Writer

Globalization Partners

United States (Remote)
1 Month ago
Ceragon Networks - Python Automation Engineer

Ceragon Networks

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Sailpoint - Senior Solutions Engineer

Sailpoint

Dallas, Texas, United States (On-Site)
1 Month ago
Veeam Software - Platform Engineer

Veeam Software

Pune, Maharashtra, India (Hybrid)
3 Months ago
Nagarro - Staff Engineer, Cloud

Nagarro

Colombia (Remote)
10 Months ago
Argus - Site Reliability Engineer (LATAM)

Argus

(Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Canonical - Software Engineer

Canonical

Beijing, China (On-Site)
3 Months ago
CRB workforce  - Software Engineer

CRB workforce

Littleton, Colorado, United States (On-Site)
2 Months ago
SoftSwiss - Ruby on Rails Developer - Junior/Middle

SoftSwiss

(Remote)
1 Month ago
PwC - Senior Associate

PwC

Hyderabad, Telangana, India (On-Site)
10 Months ago
luxsoft - Network Engineer / Backend Developer

luxsoft

Egypt (Remote)
1 Month ago
Ion - Senior Business Consultant - Allegro​

Ion

Houston, Texas, United States (On-Site)
10 Months ago
Playtika - Youda - Data Analyst

Playtika

Netherlands (Hybrid)
9 Months ago
EMA - AI Applications Lead

EMA

Bengaluru, Karnataka, India (Hybrid)
3 Weeks ago
Crunchyroll - Senior Data Engineer

Crunchyroll

Culver City, California, United States (On-Site)
7 Months ago
Paper Stacking games - Narrative Planner - Infinity Nikki

Paper Stacking games

Shanghai, China (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Vaughan, Ontario, Canada

quience - French Copywriter

quience

Toronto, Ontario, Canada (On-Site)
3 Weeks ago
bounteous - Senior Murex Integration Developer

bounteous

Montreal, Quebec, Canada (On-Site)
1 Month ago
gamehive - Privacy French

gamehive

Toronto, Ontario, Canada (On-Site)
3 Months ago
PwC - Senior Manager, Data Engineering

PwC

Toronto, Ontario, Canada (On-Site)
2 Months ago
Interactive Brokers - Bilingual Investment Services Representative: French/English

Interactive Brokers

Montreal, Quebec, Canada (Hybrid)
3 Months ago
Electronic Arts - Senior Weapon and Hard Surface Modeler

Electronic Arts

Montreal, Quebec, Canada (Hybrid)
1 Month ago
Autodesk - Software Engineer, Backend

Autodesk

Toronto, Ontario, Canada (Hybrid)
2 Months ago
Epic Games - Senior UI Artist

Epic Games

Montreal, Quebec, Canada (On-Site)
2 Months ago
Airlab Inc  - Senior Producer (Game Industry)

Airlab Inc

Quebec, Canada (On-Site)
4 Months ago
Rockstar Games - Animation R&D Programmer

Rockstar Games

Oakville, Ontario, Canada (On-Site)
9 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Enphase Energy - Sr. Staff System DVT - Automation Engineer

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Jam City - Principal Platform Engineer

Jam City

Montevideo, Montevideo Department, Uruguay (Hybrid)
1 Month ago
Eqvilent - Python & DevOps Engineer

Eqvilent

(Remote)
3 Months ago
Blinkhealth - Senior Cloud Infrastructure Engineer

Blinkhealth

India (On-Site)
2 Months ago
endava - Solution Architect - Payments

endava

Sydney, New South Wales, Australia (On-Site)
1 Month ago
Argus - Site Reliability Engineer (APAC)

Argus

Australia (Remote)
4 Months ago
Xsolla - Site Reliability Engineer

Xsolla

Montreal, Quebec, Canada (Remote)
1 Month ago
luxsoft - DevOps Engineer

luxsoft

Bengaluru, Karnataka, India (On-Site)
3 Months ago
UXBERT Labs - Senior Solution Architect (IoT/Bluetooth Integration)

UXBERT Labs

Riyadh, Riyadh Province, Saudi Arabia (Hybrid)
7 Months ago
deel. - Senior Backend Engineer, Node.js + AWS

deel.

United Kingdom (Remote)
3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Dubai, Dubai, United Arab Emirates (Remote)

Philippines (Remote)

Raleigh, North Carolina, United States (Remote)

State Of São Paulo, Brazil (Remote)

North Carolina, United States (Remote)

Reading, England, United Kingdom (Hybrid)

California, United States (Remote)

View All Jobs

Get notified when new jobs are added by extreme network

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug