Senior IT Monitoring Engineer / Site Reliability Engineer

1 Year ago • 5 Years + • Devops

Job Summary

Job Description

CrowdStrike is seeking a Sr. IT Monitoring Engineer/Site Reliability Engineer (SRE) to join their IT Operations team. The role involves designing, implementing, and maintaining monitoring solutions for critical IT infrastructure and applications, focusing on reliability, availability, and performance. The engineer will work at the intersection of operations and development, applying software engineering principles to operations tasks and emphasizing system reliability and automation. This position requires a proactive approach to identifying and resolving issues before they impact business operations and participation in on-call rotations. Responsibilities include designing monitoring solutions, configuring alerts, defining SLOs, creating dashboards, conducting reliability reviews, participating in incident response, conducting post-incident reviews, developing automation scripts, and collaborating with development, infrastructure, and security teams.
Must have:
  • 5+ years of experience with enterprise monitoring tools
  • Proficiency in scripting languages (Python, Bash, PowerShell)
  • Experience with log management platforms
  • Working knowledge of cloud services monitoring
  • Experience with APM, DEM, and infrastructure monitoring
  • Knowledge of SRE principles, SLOs, error budgets, and incident management
  • Experience with automated alerting and remediation
  • Strong incident triage and root cause analysis skills
  • Experience participating in on-call rotations
Good to have:
  • Familiarity with Infrastructure as Code
  • Familiarity with containerization
  • SRE, cloud platform, or monitoring tool certifications
  • ITIL Foundation certification
  • Bachelor's degree in Computer Science or related field
Perks:
  • Remote-friendly and flexible work culture
  • Market leader in compensation and equity awards
  • Comprehensive physical and mental wellness programs
  • Competitive vacation and holidays
  • Paid parental and adoption leaves
  • Professional development opportunities
  • Employee Networks and volunteer opportunities
  • Vibrant office culture with world class amenities

Job Details

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.

About the Role:
The CrowdStrike Information Technology team is looking for a skilled Sr. IT Monitoring Engineer/Site Reliability Engineer (SRE) to join our IT Operations team. In this role, you will be responsible for designing, implementing, and maintaining monitoring solutions that ensure the reliability, availability, and performance of our critical IT infrastructure and applications. You will work at the intersection of operations and development, applying software engineering principles to operations tasks while focusing on system reliability and automation. This position requires a proactive approach to identifying and resolving issues before they impact business operations, as well as participating in on-call rotations to address incidents when they occur.

What You’ll Need:

  • 5+ years of experience with enterprise monitoring tools (Prometheus, LogicMonitor, Datadog, ThousandEyes, Zscaler Digital Experience (ZDX))

  • Strong proficiency in scripting languages (Python, Bash, PowerShell) for automation

  • Experience with log management platforms (ELK stack, Splunk, LogScale)

  • Working knowledge of cloud services monitoring (AWS CloudWatch, GCP)

  • Experience with application performance monitoring (APM), digital experience monitoring (DEM) and infrastructure monitoring

  • Knowledge of SRE principles, SLOs, error budgets, and incident management

  • Experience with automated alerting, remediation workflows, and CI/CD pipeline monitoring

  • Familiarity with Infrastructure as Code (Terraform, Ansible) and containerization (Docker, Kubernetes)

  • Strong incident triage, root cause analysis, and documentation skills

  • Experience participating in on-call rotations and emergency response


What You'll Do:

Monitoring and Reliability

  • Design and maintain comprehensive monitoring solutions across infrastructure and applications

  • Configure appropriate alerting thresholds to ensure timely response to potential issues

  • Define and track SLOs and error budgets for critical services

  • Create and maintain dashboards providing real-time visibility into system health

  • Conduct regular reviews of system reliability and recommend improvements


Incident Management and Operations

  • Participate in on-call rotation to respond to alerts and incidents

  • Lead incident response efforts and conduct thorough post-incident reviews

  • Document incidents, resolutions, and lessons learned

  • Develop and refine incident response procedures to improve MTTR

  • Implement proactive monitoring to detect potential issues before they impact users


Automation and Collaboration

  • Develop scripts and automation to streamline monitoring tasks and reduce manual effort

  • Create self-healing systems that can automatically remediate common issues

  • Integrate monitoring tools with other operational systems

  • Work closely with development, infrastructure, and security teams

  • Provide guidance on monitoring best practices and observability

  • Maintain comprehensive documentation for monitoring systems and procedures


Continuous Improvement

  • Stay current with industry trends in monitoring and site reliability engineering

  • Analyze monitoring data to identify patterns and improvement opportunities

  • Implement metrics to track the effectiveness of monitoring processes

  • Contribute to the evolution of the organization's monitoring strategy


Bonus Points:

  • SRE, cloud platform, or monitoring tool certifications

  • ITIL Foundation certification

  • Bachelor's degree in Computer Science, Information Technology, or related field


Shift Timings: 12PM - 9PM IST

#LI-DP1

#LI-VJ1

#LI-Remote

Benefits of Working at CrowdStrike:

  • Remote-friendly and flexible work culture

  • Market leader in compensation and equity awards

  • Comprehensive physical and mental wellness programs

  • Competitive vacation and holidays for recharge

  • Paid parental and adoption leaves

  • Professional development opportunities for all employees regardless of level or role

  • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections

  • Vibrant office culture with world class amenities

  • Great Place to Work Certified™ across the globe

CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.

CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.

If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.

Similar Jobs

Lilt - Voice Talent Required - Bulgarian

Lilt

Bulgaria (Remote)
2 Weeks ago
caliogo - Sales Compensation Analyst

caliogo

Philippines (On-Site)
1 Month ago
quience - Senior Data Analyst - Performance Marketing

quience

United States (Remote)
3 Months ago
Yodo1 - Head of Sales

Yodo1

(Remote)
1 Month ago
Snap Mobile INC - Account Executive

Snap Mobile INC

Fort Worth, Texas, United States (On-Site)
3 Months ago
Sword Health - DevOps Engineer

Sword Health

Portugal (Hybrid)
5 Months ago
Toast - Principal Cloud Engineer

Toast

United States (Remote)
1 Month ago
Apple - RFIC Layout Automation Engineer

Apple

Sunnyvale, California, United States (On-Site)
2 Months ago
Electronic Arts - Site Reliability Engineer III

Electronic Arts

Vancouver, British Columbia, Canada (Hybrid)
3 Weeks ago
Palo Alto Networks - Marketplace Operations Manager (Cloud Service Providers)

Palo Alto Networks

Amsterdam, North Holland, Netherlands (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

 Dassault Systèmes - Partner Acquisition Manager

Dassault Systèmes

Radnor, Pennsylvania, United States (On-Site)
2 Months ago
IMC - Campus Recruiter - Technology

IMC

Sydney, New South Wales, Australia (On-Site)
2 Months ago
Figma - Software Engineer, Product Engineering

Figma

San Francisco, California, United States (Remote)
1 Month ago
Everlaw - Strategic AI Solutions Consultant

Everlaw

United States (Remote)
3 Weeks ago
DNEG - Gen AI Workflow Designer

DNEG

London, England, United Kingdom (On-Site)
4 Weeks ago
Epic Games - Lead Automation Programmer

Epic Games

Vancouver, British Columbia, Canada (On-Site)
4 Months ago
ClearPoint Recruitment - B2B Lead Generators

ClearPoint Recruitment

Watford, England, United Kingdom (On-Site)
5 Years ago
MiQ - Client Partner

MiQ

New York, New York, United States (On-Site)
2 Weeks ago
legion - Senior UX Designer

legion

Bucharest, Bucharest, Romania (Hybrid)
2 Months ago
ISS Stoxx - Accounts Receivable Analyst (Business-to-Business Collections)

ISS Stoxx

Makati City, Metro Manila, Philippines (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in India

velotio technologies  - QA Architect

velotio technologies

Pune, Maharashtra, India (Remote)
2 Months ago
eBay - Web (FE) Engineer

eBay

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
Accenture - Software Development Lead

Accenture

Chennai, Tamil Nadu, India (On-Site)
2 Weeks ago
Kaseya - Senior Engineer - Cloud Ops

Kaseya

Bengaluru, Karnataka, India (On-Site)
9 Months ago
Capgemini - Machine Learning Engineer

Capgemini

Chennai, Tamil Nadu, India (On-Site)
3 Months ago
Accenture - Clinical Data Services Associate

Accenture

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Clearwater Analytics - Senior Product Manager

Clearwater Analytics

Noida, Uttar Pradesh, India (On-Site)
2 Months ago
luxsoft - Senior Java Developer

luxsoft

Hyderabad, Telangana, India (On-Site)
1 Month ago
Rippling - Senior Software Engineer - Tax Platform

Rippling

Bengaluru, Karnataka, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Temporal Technologies - Staff Software Engineer, Cloud Capacity

Temporal Technologies

United States (Remote)
1 Month ago
Nagarro - System Engineer Infrastructure Services

Nagarro

Germany (Remote)
6 Months ago
Prophecy - Cloud Engineer

Prophecy

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Hedra - Senior / Staff Platform Engineer

Hedra

San Francisco, California, United States (On-Site)
2 Months ago
Rackspace Technology - AWS Devops III

Rackspace Technology

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
Saronic Technologies - DevOps Engineer

Saronic Technologies

Austin, Texas, United States (On-Site)
2 Weeks ago
Cursor - Infrastructure Engineer

Cursor

San Francisco, California, United States (On-Site)
4 Weeks ago
EMA - Solution Architect

EMA

United States (Remote)
6 Months ago
Granicus - Senior Solution Architect

Granicus

United States (Remote)
1 Month ago
Intel  - Platform Hardware Engineer

Intel

Guadalajara, Jalisco, Mexico (On-Site)
1 Year ago

Get notifed when new similar jobs are uploaded

About The Company

CrowdStrike was founded in 2011 to fix a fundamental problem: The sophisticated attacks that were forcing the world’s leading businesses into the headlines could not be solved with existing malware-based defenses. Founder George Kurtz realized that a brand new approach was needed — one that combines the most advanced endpoint protection with expert intelligence to pinpoint the adversaries perpetrating the attacks, not just the malware. There’s much more to the story of how Falcon has redefined endpoint protection but there’s only one thing to remember about CrowdStrike: We stop breaches.
View All Jobs

Get notified when new jobs are added by Crowd Strick

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug