Senior Site Reliability Engineer (Senior SRE)

4 Months ago • 14 Years +
Devops

Job Description

As a Senior Site Reliability Engineer (SRE) at Wind River, you will be responsible for deploying, managing, and scaling highly available, secure, and resilient software services across multi-cloud (AWS, Azure, GCP) and on-premises environments. You will collaborate with developers, architects, and operations teams to improve system reliability, automation, security, and platform performance. Responsibilities include managing Kubernetes clusters, cloud infrastructure, monitoring, CI/CD pipelines, security, compliance, and cost optimization, and you will communicate complex infrastructure concepts and strategies.
Good To Have:
  • Certifications in Kubernetes (CKA/CKAD/CKS), AWS, Azure, or GCP.
  • Familiarity with multi-cloud management tools and strategies.
  • Background in software development or software infrastructure management.
Must Have:
  • Extensive experience in Kubernetes and container orchestration.
  • Hands-on experience with cloud platforms (AWS, Azure, GCP).
  • Proficiency in scripting and automation languages (Python, Bash, Go).
  • Strong knowledge of CI/CD tools and pipeline design.
  • 14+ years of experience as a Site Reliability Engineer

Add these skills to join the top 1% applicants for this job

problem-solving
github
quality-control
gitlab
security-scanning
aws
azure
prometheus
openstack
terraform
grafana
vmware
ci-cd
kubernetes
python
github-actions
bash
jenkins

Description

Position at Wind River

Job Title: Senior Site Reliability Engineer (Senior SRE) 
ABOUT WIND RIVER Wind River is a global leader delivering software for mission-critical intelligent systems. For over four decades, Wind River has powered billions of systems requiring the highest levels of security, safety, and reliability. Our software supports groundbreaking NASA missions such as Artemis I, the James Webb Space Telescope, multiple Mars rovers, and pioneering 5G initiatives. 
ABOUT THE OPPORTUNITY Wind River Systems is seeking a Senior Site Reliability Engineer (SRE) experienced in deploying, managing, and scaling highly available, secure, and resilient software services across multi-cloud (AWS, Azure, GCP) and on-premises environments. You will collaborate closely with developers, architects, and operations teams to enhance system reliability, automation, security, and overall platform performance. 
RESPONSIBILITIES 
Kubernetes and Container Orchestration: 
  • Deploy, manage, optimize, and troubleshoot large-scale Kubernetes clusters in multi-cloud (AWS, Azure, GCP) and hybrid environments (OpenStack, VMware vSphere). 
  • Implement cluster autoscaling and resource management strategies with tools such as Karpenter. 
Cloud and Hybrid Infrastructure Management: 
  • Architect, implement, and manage infrastructure in multi-cloud (AWS, GCP, Azure) and hybrid environments. 
  • Optimize cloud resource usage leveraging AWS Cost Explorer, Savings Plans, and similar tools on other cloud providers. 
Monitoring, Observability, and Reliability: 
  • Develop and maintain comprehensive monitoring, logging, tracing, and alerting solutions using Prometheus, Grafana, CloudWatch, Datadog, or similar tools. 
  • Conduct root cause analysis (RCA) and implement proactive improvements to maximize system uptime, reliability, and performance. 
CI/CD Pipelines and Automation: 
  • Design, implement, and maintain robust CI/CD pipelines using Jenkins, GitLab CI/CD, GitHub Actions, or Tekton. 
  • Promote and implement DevSecOps best practices across teams to automate testing, security scanning, and deployments. 
Security, Compliance, and Governance: 
  • Integrate comprehensive security practices throughout the software lifecycle (DevSecOps), including vulnerability scanning and secure coding practices. 
  • Manage secrets securely using Vault, AWS Secrets Manager, Azure Key Vault, or similar tools. 
  • Ensure adherence to compliance standards and regulatory requirements. 
Cost Optimization and Efficiency: 
  • Implement and enforce governance policies and frameworks to optimize infrastructure usage, reduce costs, and enhance operational efficiency. 
  • Regularly review and optimize cloud expenditure, performance, and scaling strategies. 
Collaboration and Communication: 
  • Collaborate closely with architects, developers, QA, product teams, and management stakeholders. 
  • Clearly communicate complex infrastructure concepts and strategies to diverse stakeholders. 
QUALIFICATIONS 
  • Bachelor's degree in Computer Science, Information Technology, or related technical discipline (Master’s preferred). 
  • 14+ years of experience as a Site Reliability Engineer, DevOps Engineer, Platform Engineer, or similar role. 
  • Extensive expertise in Kubernetes, container orchestration, and related ecosystem. 
  • Hands-on experience with cloud platforms (AWS, Azure, GCP), OpenStack, VMware vSphere, and hybrid environments. 
  • Proficiency in scripting and automation languages (Python, Bash, Go, or similar). 
  • Solid experience with infrastructure as code (Terraform, CloudFormation, Pulumi). 
  • Strong knowledge of CI/CD tools and pipeline design (Jenkins, GitLab CI/CD, GitHub Actions, Tekton). 
  • Exceptional troubleshooting and problem-solving skills, coupled with a proactive and continuous learning mindset. 
PREFERRED QUALIFICATIONS 
  • Certifications in Kubernetes (CKA/CKAD/CKS), AWS (Solutions Architect, DevOps Engineer), Azure, or GCP. 
  • Familiarity with multi-cloud management tools and strategies. 
  • Background in software development or software infrastructure management. 
Join our team at Wind River, contribute to building highly reliable, secure, and innovative software systems, and help shape the future of software-defined environments! 
  

Set alerts for more jobs like Senior Site Reliability Engineer (Senior SRE)
Set alerts for new jobs by Wind River
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙