Cloud Operations Lead Monitoring & AI Ops Engineer

2 Months ago • 7 Years + • Devops

Job Summary

Job Description

The Cloud Operations Lead Monitoring & AI Ops Engineer will lead the strategy, implementation, and management of global network monitoring tools and AI Ops solutions. This role involves ensuring the reliability, performance, and security of cloud infrastructure through proactive monitoring, automation, and advanced analytics. The engineer will collaborate with engineering, operations, and security teams to enhance observability and incident response capabilities. This position is part of a fast-growing Global Tech Ops team, playing a key role in scaling and optimizing cloud operations. Key responsibilities include leading monitoring initiatives, developing strategies for anomaly detection, overseeing deployment and management of monitoring tools like Prometheus and Grafana, ensuring observability, collaborating with engineering teams, defining best practices for cloud environments, utilizing AI Ops tools, analyzing monitoring data, providing guidance on incident response, maintaining documentation, and staying updated with industry trends.
Must have:
  • 7+ years of cloud operations experience with global monitoring and AI Ops tools
  • Expertise in cloud platforms (AWS, Azure, OCI) and their monitoring services
  • Strong knowledge of monitoring platforms, including Prometheus and Grafana
  • Experience designing AI-driven monitoring solutions for large-scale environments
  • Proficiency in automation and scripting (e.g., Python, Go, Bash)
  • Excellent leadership, collaboration, and communication skills
Good to have:
  • Relevant certifications (e.g., AWS Certified Solutions Architect)
  • Experience with ITIL processes and incident and problem management
  • Knowledge of cloud security monitoring and threat detection methodologies
  • Experience in designing and modernizing monitoring tools
Perks:
  • Retirement Plans
  • Medical, Dental and Vision Coverage
  • Paid Time Off
  • Paid Parental Leave
  • Support for Community Involvement

Job Details

Job Title:

Cloud Operations Lead Monitoring & AI Ops Engineer

About Skyhigh Security:

Skyhigh Security is a dynamic, fast-paced, cloud company that is a leader in the security industry.  Our mission is to protect the world’s data, and because of this, we live and breathe security. We value learning at our core, underpinned by openness and transparency. 

Since 2011, organizations have trusted us to provide them with a complete, market-leading security platform built on a modern cloud stack. Our industry-leading suite of products radically simplifies data security through easy-to-use, cloud-based, Zero Trust solutions that are managed in a single dashboard, powered by hundreds of employees across the world. With offices in Santa Clara, Aylesbury, Paderborn, Bengaluru, Sydney, Tokyo and more, our employees are the heart and soul of our company. 

Skyhigh Security Is more than a company; here, when you invest your career with us, we commit to investing in you. We embrace a hybrid work model, creating the flexibility and freedom you need from your work environment to reach your potential. From our employee recognition program, to our ‘Blast Talks' learning series, and team celebrations (we love to have fun!), we strive to be an interactive and engaging place where you can be your authentic self. 

We are on these too! Follow us on LinkedIn and Twitter@SkyhighSecurity.

Role Overview:

The Cloud Operations Lead Monitoring & AI Ops Engineer at Skyhigh Security will be responsible for leading the strategy, implementation, and management of global network monitoring tools and AI Ops solutions. This role involves ensuring the reliability, performance, and security of our cloud infrastructure through proactive monitoring, automation, and advanced analytics. The successful candidate will collaborate with engineering, operations, and security teams to enhance observability and incident response capabilities. This position is part of a fast-growing Global Tech Ops team, playing a key role in scaling and optimizing our cloud operations.

Key Responsibilities:

  • Serve as the technical lead for global monitoring and AI Ops initiatives across the Skyhigh Security product portfolio.
  • Develop and implement strategies for proactive monitoring, anomaly detection, and automated incident resolution.
  • Oversee the deployment and management of monitoring/logging tools such as Prometheus, Grafana, OpenSearch, PagerDuty AI Ops, and Kentik.
  • Ensure comprehensive observability of cloud environments, network performance, and security metrics.
  • Collaborate with engineering teams to integrate monitoring principles into the software development lifecycle (SDLC), making observability an integral part of deployments rather than a post-deployment task.
  • Define and implement best practices for monitoring high-scale cloud environments across AWS, Azure, and OCI.
  • Utilize AI Ops tools to enhance event correlation, root cause analysis, and automated remediation.
  • Analyze monitoring data to identify trends, optimize system performance, and improve alerting mechanisms.
  • Provide guidance on incident response processes and drive continuous improvement in monitoring effectiveness.
  • Maintain documentation for monitoring frameworks, configurations, and operational procedures.
  • Stay updated with industry trends, emerging AI Ops technologies, and best practices in cloud monitoring.


Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 7+ years of experience in cloud operations with a strong focus on global monitoring and AI Ops tools.
  • Expertise in cloud platforms (AWS, Azure, OCI) and their monitoring services.
  • Deep understanding and hands-on experience with Jira Cloud, Confluence, and Atlassian Service Management.
  • Strong knowledge of monitoring and observability platforms, including Prometheus, Grafana, OpenSearch, PagerDuty AI Ops, and Kentik.
  • Experience designing and implementing AI-driven monitoring solutions for large-scale environments.
  • Proficiency in automation and scripting (e.g., Python, Go, Bash) to enhance monitoring capabilities.
  • Strong analytical and problem-solving skills with the ability to interpret complex monitoring data.
  • Excellent leadership, collaboration, and communication skills.
  • Ability to work in a fast-paced, dynamic environment.


Preferred Qualifications:

  • Relevant certifications (e.g., AWS Certified Solutions Architect, Azure Administrator).
  • Experience with ITIL processes and best practices in incident and problem management.
  • Knowledge of cloud security monitoring and threat detection methodologies.
  • Experience in designing and modernizing monitoring tools for cloud-native and hybrid environments.
  • Understanding of network performance monitoring and optimization strategies.

Company Benefits and Perks:

We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Retirement Plans

  • Medical, Dental and Vision Coverage

  • Paid Time Off

  • Paid Parental Leave

  • Support for Community Involvement

We're serious about our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Similar Jobs

PhonePe - Software Architect

PhonePe

Pune, Maharashtra, India (On-Site)
1 Month ago
Qualcomm - UWB Software Customer Engineer

Qualcomm

Suwon-si, Gyeonggi-do, South Korea (On-Site)
3 Weeks ago
hogarth - Screen Artist

hogarth

Buenos Aires, Buenos Aires, Argentina (Hybrid)
1 Month ago
Adyen - Demand Generation Manager

Adyen

Shanghai, China (On-Site)
1 Month ago
Assystems - Sr. Structural Engineer

Assystems

Al Khobar, Eastern Province, Saudi Arabia (On-Site)
8 Months ago
Thales - DevOps Manager

Thales

Rehovot, Center District, Israel (Hybrid)
2 Months ago
Qualcomm - Engineer, Staff-IOT Platform

Qualcomm

Hyderabad, Telangana, India (On-Site)
1 Month ago
Palo Alto Networks - Senior Consulting Director, Cloud Security, Proactive Services (Unit 42)

Palo Alto Networks

Dallas, Texas, United States (Remote)
1 Week ago
Gigamon - Principal Engineer - Cloud/AI

Gigamon

Santa Clara, California, United States (On-Site)
3 Months ago
Qualcomm - Automotive Linux Platform Engineer

Qualcomm

Shanghai, China (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Coupa - Sr. Lead Software Engineer (Full Stack)

Coupa

Pune, Maharashtra, India (On-Site)
1 Month ago
Haven Studios  Inc  - Programmeur·euse Senior – Jouabilité/Senior Gameplay Programmer

Haven Studios Inc

Montreal, Quebec, Canada (On-Site)
7 Months ago
Alpha Sense - Sr. Accounts Receivable

Alpha Sense

New York, United States (On-Site)
1 Month ago
NVIDIA - Senior Power Architecture and Optimization Engineer

NVIDIA

Santa Clara, California, United States (On-Site)
5 Months ago
Loft Orbital - Hub Systems Engineer

Loft Orbital

Toulouse, Occitanie, France (Remote)
2 Months ago
Netflix - Senior Research Program Manager, Quantitative Operations EMEA

Netflix

London, England, United Kingdom (On-Site)
5 Months ago
Interactive Brokers - Identity Verification Analyst

Interactive Brokers

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Tesla - Senior Project Engineer - BESS, EMEA

Tesla

Madrid, Community Of Madrid, Spain (On-Site)
5 Months ago
Palo Alto Networks - Director, Finance - G&A FP&A

Palo Alto Networks

Santa Clara, California, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Aeries technology - Site Reliability Engineer

Aeries technology

Bengaluru, Karnataka, India (On-Site)
2 Months ago
DevRev - Solutions Engineer

DevRev

Mumbai, Maharashtra, India (On-Site)
2 Months ago
Accenture - Delivery Lead Manager

Accenture

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Nagarro - Senior Staff Engineer, Data Science

Nagarro

India (Remote)
8 Months ago
Globalization Partners - Sr Manager, Software Engineering ( AI Domain)

Globalization Partners

India (Remote)
1 Month ago
Dream Sports - SDE 3 - Full Stack Developer

Dream Sports

Mumbai, Maharashtra, India (On-Site)
3 Months ago
Capgemini - AS400 Administration

Capgemini

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Ion - Front-End / GUI Developer C#- 4908

Ion

Noida, Uttar Pradesh, India (Hybrid)
9 Months ago
Capgemini - Supply Planning

Capgemini

Chennai, Tamil Nadu, India (On-Site)
1 Month ago
Capgemini - ETL Test Engineer

Capgemini

Gurugram, Haryana, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Rackspace Technology - Site Reliability Engineer III

Rackspace Technology

India (Remote)
4 Months ago
Nice - Specialist Automation Engineer

Nice

Pune, Maharashtra, India (Hybrid)
1 Week ago
Google - Senior Software Developer, Site Reliability Development

Google

Sunnyvale, California, United States (On-Site)
2 Months ago
Enphase Energy - Sr. Staff System DVT - Automation Engineer

Enphase Energy

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Capgemini - SAP E2E Solution Architect - Presales

Capgemini

Bengaluru, Karnataka, India (On-Site)
1 Month ago
luxsoft - Senior DevOps Engineer with SRE responsibilities

luxsoft

Bucharest, Bucharest, Romania (Hybrid)
1 Month ago
bytedance - Software Engineer, Cloud Native Platform

bytedance

San Jose, California, United States (On-Site)
8 Months ago
Nice - Senior Specialist Automation Engineer, Actimize

Nice

Pune, Maharashtra, India (On-Site)
3 Weeks ago
Rippling - Senior Software Engineer - Platform

Rippling

Bengaluru, Karnataka, India (On-Site)
3 Months ago
ISS Stoxx - Principal Platform Engineer

ISS Stoxx

London, England, United Kingdom (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Trellix is a global company redefining the future of cybersecurity. The company’s open and native extended detection and response (XDR) platform helps organizations confronted by today’s most advanced threats gain confidence in the protection and resilience of their operations. Trellix’s security experts, along with an extensive partner ecosystem, accelerate technology innovation through machine learning and automation to empower over 53,000 business and government customers. More at https://trellix.com.

United States (On-Site)

Reston, Virginia, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Cork, County Cork, Ireland (On-Site)

Singapore (Remote)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by Trellix

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug