Site Reliability Engineer - Ceph Storage Engineer

6 Minutes ago • 2 Years +

Devops

Job Description

GoDaddy is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join their dynamic team. This remote role focuses on automating and maintaining storage infrastructure, specifically Ceph, to ensure system reliability, scalability, and performance. Responsibilities include automating daily storage operations, developing tools and scripts, monitoring system performance, participating in agile processes, and continuously improving system reliability through proactive optimization.

Good To Have:

Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
Exposure to and experience with compute platforms (e.g., OpenStack, AWS).
Familiarity with and ability to contribute to CI/CD pipelines and automation workflows.

Must Have:

Automate and maintain day-to-day storage system operations.
Develop and maintain tools and automation scripts.
Monitor system performance and implement solutions.
Participate in agile concepts and processes.
Continuously improve system reliability and performance.
2+ years professional experience with Ceph in production.
2+ years experience in site reliability engineering.
Experience with Ceph deployment, configuration, and management.
Proficiency in Linux/Unix systems, automation, and operating at scale.
Proficiency in Python or Bash scripting.
Experience with Ansible, Terraform, or SaltStack.
Experience with Nagios-based monitoring tools (e.g., Icinga2).
Experience with observability tooling (Prometheus, Grafana, Mimir, Loki).
Solid understanding of core networking concepts and protocols.

Perks:

Paid time off
Retirement savings (e.g., 401k, pension schemes)
Bonus/incentive eligibility
Equity grants
Participation in employee stock purchase plan
Competitive health benefits
Family-friendly benefits including parental leave
Employee Resource Groups
Support for entrepreneurs/side hustles
Diverse and inclusive culture

Add these skills to join the top 1% applicants for this job

game-texts

agile-development

automated-testing

networking

linux

aws

unix

prometheus

openstack

ansible

terraform

grafana

nagios

ci-cd

docker

kubernetes

python

bash

Location Details: At GoDaddy the future of work looks different for each team. Some teams work in the office full-time; others have a hybrid arrangement (they work remotely some days and in the office some days) and some work entirely remotely. This is a remote position, so you’ll be working remotely from your home. You may occasionally visit a GoDaddy office to meet with your team for events or meetings. Join Our Team GoDaddy is seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. This role will focus on automating and maintaining our storage infrastructure with a focus on Ceph, ensuring the reliability, scalability, and performance of our systems. What you'll get to do... * Automate and maintain day-to-day operations of storage systems to support application demands. * Develop and maintain tools and automation scripts to streamline storage operations and improve efficiency. * Monitor system performance, identify issues, and implement solutions to ensure high availability and reliability. * Participate in agile concepts such as daily stand-up meetings, task tracking boards, design and code reviews, automated testing, continuous integration, and deployment. * Continuously improve system reliability, performance, and capacity through proactive monitoring, automation, and optimization. Your experience should include... * 2+ years of professional experience with Ceph, working in a production environment * 2+ years of experience in site reliability engineering or a similar role. * 2+ years of professional experience with Ceph, including deployment, configuration, and management of Ceph clusters and systems. * Experience working on Linux/Unix systems, with a focus on automation and operating at scale. * Proficiency in Python or Bash. * Experience with Ansible, Terraform, or SaltStack. * Experience with Nagios-based monitoring tools, such as Icinga2. * Experience with observability tooling, such as Prometheus, Grafana, Mimir, and Loki. * Solid understanding of core networking concepts and protocols, particularly in relation to Linux/Unix systems. You might also have... * Experience with containerization and orchestration tools (e.g., Docker, Kubernetes). * Exposure to and experience working with compute platforms (e.g., OpenStack, AWS). * Familiarity with ability to contribute to CI/CD pipelines and automation workflows.

Set alerts for more jobs like Site Reliability Engineer - Ceph Storage Engineer

Set alerts for new jobs by GoDaddy

Set alerts for new Devops jobs in United Kingdom

Set alerts for new jobs in United Kingdom

Set alerts for Devops (Remote) jobs