Senior Site Reliability Engineer

1 Month ago • 10-15 Years

Devops

Job Description

We are looking for a Senior Site Reliability Engineer (SRE) who brings deep technical expertise across Azure administration, SCVMM/Hyper-V, Windows and Linux systems, and application support. You will be responsible for driving reliability, scalability, and performance improvements in both our public and private cloud infrastructure. This role is ideal for someone passionate about infrastructure as code, automation, and building resilient, observable systems, owning end-to-end performance monitoring, incident response, and root cause analysis.

Good To Have:

Knowledge of containerization and Kubernetes
Strong interpersonal, oral, and written communication and collaboration skills with all levels of management
Strong organizational skills including the ability to adapt to shifting priorities and meet frequent deadlines
Demonstrated proactive approach to problem-solving with strong judgment and decision-making capability
Highly resourceful and collaborative team-player, with the ability to also be independently effective and exude initiative and a sense of urgency
Exemplifies our customer-focused, action-oriented, results-driven culture
Forward looking thinker, who actively seeks opportunities, has a desire for continuous learning, and proposes solutions
Ability to act with discretion and maintain complete confidentiality
Dedicated to the firm’s values of non-negotiable integrity, valuing our people, exceeding client expectations, and embracing intellectual curiosity and rigor

Must Have:

Design, operate, and scale hybrid cloud environments using Azure and SCVMM/Hyper-V
Handle escalations from teams across Global Technology
Participate in Problem Management and innovate permanent solutions to recurring issues
Use Infrastructure as Code to create repeatable service offerings
Own end-to-end performance monitoring, incident response, and root cause analysis (RCA)
Collaborate with development and operations teams to define SLIs/SLOs and improve system reliability
Drive automation of manual processes, including environment provisioning, configuration management, and deployments
Maintain detailed, high-quality documentation for systems, environments, and workflows
Expert in Microsoft Azure (IaaS, PaaS, identity, networking, monitoring)
Strong experience with System Center Virtual Machine Manager (SCVMM) and Hyper-V
Advanced Windows and Linux (Ubuntu) system administration
Experience with Splunk, AppDynamics, SolarWinds for monitoring
Proficiency in CI/CD Systems like GitLab CI and Octopus Deploy
Solid understanding of DNS, firewalls, certificates, identity/access controls
Experience supporting production environments, distributed systems, microservices, and REST APIs

Add these skills to join the top 1% applicants for this job

team-management

game-texts

gitlab

networking

dns

linux

incident-response

azure

hyper-v

microsoft-azure

ci-cd

microservices

kubernetes

splunk

This role is ideal for someone passionate about infrastructure as code, automation, and building resilient, observable systems.

Years of Experience

10-15+ years of progressive experience in system administration, cloud operations, or infrastructure engineering

Special Requirements

Weekend break/fix or maintenance work may occasionally be required

Responsibilities

Design, operate, and scale hybrid cloud environments using Azure and SCVMM/Hyper-V
Handle escalations from teams across Global Technology
Participate in Problem Management and innovate permanent solutions to recurring issues
Use Infrastructure as Code to create repeatable service offerings
Own end-to-end performance monitoring, incident response, and root cause analysis (RCA)
Collaborate with development and operations teams to define SLIs/SLOs and improve system reliability
Drive automation of manual processes, including environment provisioning, configuration management, and deployments
Maintain detailed, high-quality documentation for systems, environments, and workflows

Candidate Requirements

Cloud Infrastructure: Expert in Microsoft Azure (IaaS, PaaS, identity, networking, monitoring)
Virtualization: Strong experience with System Center Virtual Machine Manager (SCVMM) and Hyper-V
Operating Systems: Advanced Windows and Linux (Ubuntu) system administration
Knowledge of containerization and Kubernetes
Monitoring & Observability: Splunk, AppDynamics, SolarWinds
CI/CD Systems: GitLab CI, Octopus Deploy
Networking & Security: Solid understanding of DNS, firewalls, certificates, identity/access controls
Application Support: Experience supporting production environments, distributed systems, microservices, REST APIs

Core Values

Strong interpersonal, oral, and written communication and collaboration skills with all levels of management
Strong organizational skills including the ability to adapt to shifting priorities and meet frequent deadlines,
Demonstrated proactive approach to problem-solving with strong judgment and decision-making capability.
Highly resourceful and collaborative team-player, with the ability to also be independently effective and exude initiative and a sense of urgency.
Exemplifies our customer-focused, action-oriented, results-driven culture.
Forward looking thinker, who actively seeks opportunities, has a desire for continuous learning, and proposes solutions.
Ability to act with discretion and maintain complete confidentiality.
Dedicated to the firm’s values of non-negotiable integrity, valuing our people, exceeding client expectations, and embracing intellectual curiosity and rigor.

Set alerts for more jobs like Senior Site Reliability Engineer

Set alerts for new jobs by Ruselle Investments

Set alerts for new Devops jobs in India

Set alerts for new jobs in India

Set alerts for Devops (Remote) jobs