SRE II
Electronic Arts
Job Summary
As a Software Engineer II on the Site Reliability Engineering (SRE) team at Electronic Arts, you will contribute to the design, automation, and operation of large-scale, cloud-based systems that power EA’s global gaming platform. This role involves enhancing service reliability, scalability, and performance across multiple game studios. You will build and operate scalable systems, develop automation, manage monitoring and incident response, contribute to CI/CD pipelines, and collaborate on reliability engineering to ensure stable production services.
Must Have
- Build and operate scalable, cloud-based infrastructure (AWS/GCP/Azure, Kubernetes, Terraform, Docker).
- Develop automation scripts and workflows for platform operations.
- Create and maintain monitoring, alerts; participate in on-call and incident response.
- Contribute to CI/CD pipeline design and maintenance.
- Collaborate on reliability and performance engineering (SLIs/SLOs/SLAs).
- Participate in post-incident reviews and documentation.
- 3-5 years experience in Cloud Computing, Virtualization, Containerization (Kubernetes, Docker, VMWare).
- Experience supporting production-grade, high-availability systems with SLIs/SLOs.
- Strong Linux/Unix administration and networking fundamentals.
- Hands-on experience with Infrastructure as Code and automation tools (Terraform, Helm, Ansible, Chef).
- Proficiency in Python, Golang, Bash, or Java for scripting and automation.
- Familiarity with monitoring and observability tools (Prometheus, Grafana, Loki, Datadog).
- Exposure to distributed systems, SQL/NoSQL databases, and CI/CD pipelines.
- Strong problem-solving, troubleshooting, and collaboration skills.
Perks & Benefits
- Healthcare coverage
- Mental well-being support
- Retirement savings
- Paid time off
- Family leaves
- Complimentary games
Job Description
General Information
Role ID
211517
Worker Type
Regular Employee
Studio/Department
CT - IT
Work Model
Hybrid
Description & Requirements
SEII / SRE Engineer
As a Software Engineer II on the Site Reliability Engineering (SRE) team, you will contribute to the design, automation and operation of large-scale, cloud-based systems that power EA’s global gaming platform. You will work closely with senior engineers to enhance service reliability, scalability and performance across multiple game studios and services.
Responsibilities:
- Build and Operate Scalable Systems: Support the development, deployment, and maintenance of distributed, cloud-based infrastructure leveraging modern open-source technologies (AWS/GCP/Azure, Kubernetes, Terraform, Docker, etc.).
- Platform Operations and Automation: Develop automation scripts, tools, and workflows to reduce manual effort, improve system reliability, and optimize infrastructure operations (reducing MTTD and MTTR).
- Monitoring, Alerting & Incident Response: Create and maintain dashboards, alerts, and metrics to improve system visibility and proactively identify issues. Participate in on-call rotations and assist in incident response and root cause analysis.
- Continuous Integration / Continuous Deployment (CI/CD): Contribute to the design, implementation, and maintenance of CI/CD pipelines to ensure consistent, repeatable, and reliable deployments.
- Reliability and Performance Engineering: Collaborate with cross-functional teams to identify reliability bottlenecks, define SLIs/SLOs/SLAs, and implement improvements that enhance the stability and performance of production services.
- Post-Incident Reviews & Documentation: Participate in root cause analyses, document learnings, and contribute to preventive measures to avoid recurrence of production issues. Maintain detailed operational documentation and runbooks.
- Collaboration & Mentorship: Work closely with senior SREs and software engineers to gain exposure to large-scale systems, adopt best practices, and gradually take ownership of more complex systems and initiatives.
- Modernization & Continuous Improvement: Contribute to ongoing modernization efforts by identifying areas for improvement in automation, monitoring, and reliability.
Qualifications – Software Engineer II (Site Reliability Engineer)
- 3–5 years of experience in Cloud Computing (AWS preferred), Virtualization, and Containerization using Kubernetes, Docker, or VMWare. And Extensive hands-on experience in container orchestration technologies, such as EKS, Kubernetes, Docker
- Experience supporting production-grade, high-availability systems with defined SLIs/SLOs.
- Strong Linux/Unix administration and networking fundamentals (protocols, load balancing, DNS, firewalls).
- Hands-on experience with Infrastructure as Code and automation tools such as Terraform, Helm, Ansible, or Chef..
- Proficiency in Python, Golang, Bash, or Java for scripting and automation.
- Familiar with monitoring and observability tools like Prometheus, Grafana, Loki, or Datadog.
- Exposure to distributed systems, SQL/NoSQL databases, and CI/CD pipelines.
- Strong problem-solving, troubleshooting, and collaboration skills in cross-functional environments.
We adopt a holistic approach to our benefits programs, emphasizing physical, emotional, financial, career, and community wellness to support a balanced life. Our packages are tailored to meet local needs and may include healthcare coverage, mental well-being support, retirement savings, paid time off, family leaves, complimentary games, and more. We nurture environments where our teams can always bring their best to what they do.
Electronic Arts is an equal opportunity employer. All employment decisions are made without regard to race, color, national origin, ancestry, sex, gender, gender identity or expression, sexual orientation, age, genetic information, religion, disability, medical condition, pregnancy, marital status, family status, veteran status, or any other characteristic protected by law. We will also consider employment qualified applicants with criminal records in accordance with applicable law. EA also makes workplace accommodations for qualified individuals with disabilities as required by applicable law.
LinkedInID
1449