Sr Systems Reliability Engineer

1 Day ago • 5 Years + • $152,100 PA - $203,900 PA

System Design

Job Description

The Skywalker Sound Development Group is seeking a skilled Sr System Reliability Engineer to join their team, developing next-generation tools for audio soundtracks and media distribution. This role involves conceiving and developing tools for post-production audio content creation, working with application development, cloud computing, database, and security implementations. The engineer will ensure reliable, high-quality solutions for creative teams and global audiences, working closely with master audio content creators.

Good To Have:

Experience working with media and entertainment pipelines or pre-release content workflows.
Proficiency with Golang, Python, or C++
Experience with modern AI/ML frameworks (e.g., TensorFlow, PyTorch, Hugging Face) and their integration into operational workflows.
Knowledge of container security tools and systems, such as Falco or Aqua Security.
Experience with emerging deployment systems like ArgoCD or Flux for GitOps workflows.
Familiarity with serverless computing paradigms and technologies such as AWS Lambda or Google Cloud Run/Functions.
Understanding of high-performance computing systems in cloud environments.
Experience with administering VMWare vSphere clusters.

Must Have:

Design, manage and maintain critical infrastructure for both software development and deployed global production resources.
Collaborate on the provisioning of cloud infrastructure in AWS using Terraform to ensure consistency and scalability.
Maintain and manage multiple Kubernetes clusters across both cloud and on-premise environments.
Implement and enforce best practices for secure software development and deployment in alignment with industry standards.
Monitor, troubleshoot, and optimize build and deployment processes to maximize efficiency and minimize downtime.
Collaborate with cross-functional teams, including developers and security experts, to ensure systems meet operational requirements.
Develop, maintain, and enhance CI/CD pipelines using GitLab to support build automation, unit testing, and integration testing.
Continuously evaluate and implement tools and technologies to improve workflows and platform reliability.
BS Degree in Computer Science
5+ years of experience in DevOps, Site Reliability Engineering, or a related field.
Extensive AWS knowledge
Proficiency with modern observability practices
Proficiency with GitLab CI, Terraform, Helm, and Packer
Demonstrated experience designing and managing CI/CD pipelines for complex software platforms.
In-depth knowledge of Containers and Container Orchestration technologies: Docker, Kubernetes
Experience with Terraform or other infrastructure as code tooling.
Strong scripting skills in Python, Bash, or similar languages.
Familiarity with modern security practices for protecting sensitive assets in distributed systems.
Exceptional problem-solving skills, with a proactive and collaborative mindset.

Add these skills to join the top 1% applicants for this job

cross-functional

cpp

game-texts

gitlab

aws

new-relic

terraform

helm

vmware

pytorch

ci-cd

docker

kubernetes

python

splunk

bash

tensorflow

The Skywalker Sound Development Group is seeking a skilled Sr System Reliability Engineer to join our team. The Skysound Development Group is developing a set of next-generation tools for audio soundtracks and media distribution. We aim, through the synthesis of institutional wisdom of creative, high-quality audio and cutting-edge software engineering, to bridge the divide between content creation and audience experience.

As a Sr Systems Reliability Engineer within the Group, you will play a key role in conceiving and developing tools to help usher in the new era of post production audio content creation, working in areas such as application development, cloud computing, database, and state-of-the-art security implementations. The Development Group works closely with master audio content creators to produce novel technology for immediate utilization. Your expertise in modern development tools, cloud infrastructure, and security practices will ensure the delivery of reliable, high-quality solutions that serve the needs of creative teams and global audiences.

This role is considered Hybrid, which means the employee will work 2-3 days onsite at our office and occasionally from home.

What You'll Do

Design, manage and maintain critical infrastructure for both software development and deployed global production resources.
Collaborate on the provisioning of cloud infrastructure in AWS using Terraform to ensure consistency and scalability.
Maintain and manage multiple Kubernetes clusters across both cloud and on-premise environments.
Implement and enforce best practices for secure software development and deployment in alignment with industry standards.
Monitor, troubleshoot, and optimize build and deployment processes to maximize efficiency and minimize downtime.
Collaborate with cross-functional teams, including developers and security experts, to ensure systems meet operational requirements.
Develop, maintain, and enhance CI/CD pipelines using GitLab to support build automation, unit testing, and integration testing.
Continuously evaluate and implement tools and technologies to improve workflows and platform reliability.

What We’re Looking For

BS Degree in Computer Science
5+ years of experience in DevOps, Site Reliability Engineering, or a related field.
Extensive AWS knowledge: EC2, ECS/EKS, Lambda, ELB, ASGs, Route53, KMS, SSM, IAM, S3, ACM, VPC, RDS, Elasticache.
Proficiency with modern observability practices: application monitoring, tracing, and profiling tools (e.g. Datadog, New Relic, OpenTelemetry, Splunk).
Proficiency with GitLab CI, Terraform, Helm, and Packer
Demonstrated experience designing and managing CI/CD pipelines for complex software platforms.
In-depth knowledge of Containers and Container Orchestration technologies: Docker, Kubernetes
Experience with Terraform or other infrastructure as code tooling.
Strong scripting skills in Python, Bash, or similar languages.
Familiarity with modern security practices for protecting sensitive assets in distributed systems.
Exceptional problem-solving skills, with a proactive and collaborative mindset.

Preferred Qualifications

Experience working with media and entertainment pipelines or pre-release content workflows.
Proficiency with Golang, Python, or C++
Experience with modern AI/ML frameworks (e.g., TensorFlow, PyTorch, Hugging Face) and their integration into operational workflows.
Knowledge of container security tools and systems, such as Falco or Aqua Security.
Experience with emerging deployment systems like ArgoCD or Flux for GitOps workflows.
Familiarity with serverless computing paradigms and technologies such as AWS Lambda or Google Cloud Run/Functions.
Understanding of high-performance computing systems in cloud environments.
Experience with administering VMWare vSphere clusters.

Set alerts for more jobs like Sr Systems Reliability Engineer

Set alerts for new jobs by lucas films

Set alerts for new System Design jobs in United States

Set alerts for new jobs in United States

Set alerts for System Design (Remote) jobs