Site Reliability Engineer

11 Minutes ago • 3 Years +
Devops

Job Description

Electronic Arts is seeking a Site Reliability Engineer (SRE) for the GameKit Operations team in Bucharest, Romania. This hybrid role involves shaping EA's development platforms and services, focusing on automation, observability, and improving service reliability at scale. The SRE will assess existing monitoring, implement observability roadmaps, contribute to incident response, and lead long-term strategies for operational excellence, mentoring engineers and championing scalable engineering practices.
Must Have:
  • Build scalable monitoring and observability systems using Prometheus/Grafana, Datadog, ELK, or similar.
  • Build infrastructure and tooling using technologies like Terraform, Ansible, AWS CloudFormation, and CI/CD pipelines (GitLab CI/CD).
  • Automate operational processes using Python and Bash.
  • Operate and improve containerized applications using Kubernetes platforms (EKS, AKS, GKE).
  • Contribute to incident response processes and post-mortems.
  • Experience operating cloud platforms, especially AWS and Azure.
  • Expertise in monitoring, observability, and incident response at scale.
  • Hands-on experience with Infrastructure-as-Code and automation.
  • 3+ years of experience building SRE practices from the ground up.
  • Led on-call rotations or reliability-focused projects.
  • Mentored junior engineers and influenced engineering culture through documentation and collaboration.
Perks:
  • Healthcare coverage
  • Mental well-being support
  • Retirement savings
  • Paid time off
  • Family leaves
  • Complimentary games

Add these skills to join the top 1% applicants for this job

team-management
game-texts
gitlab
incident-response
aws
azure
prometheus
ansible
terraform
grafana
elk
ci-cd
kubernetes
python
bash

General Information

Role ID: 211245

Worker Type: Regular Employee

Studio/Department: CT - IT

Work Model: Hybrid

Description & Requirements

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity excels, new perspectives are invited, and ideas matter. A team where everyone makes play happen. Electronic Arts (EA) is looking for a Site Reliability Engineer (SRE) to join our GameKit Operations team. You will be part of a newly formed SRE function and help shape the future of how EA builds and operates its development platforms and services. If you're passionate about automation, observability, and improving service reliability at scale, we'd love to hear from you. You will report into a Senior Manager.

The work model for this role is a hybrid one, working 3 days per week from our office.

Job Requirements/Role

What You'll Do

  • In your first 60 days, gain an understanding of the GameKit environment and assess existing monitoring and observability systems.
  • By 90 days, begin implementing the observability roadmap, contribute to incident response, and identify opportunities to improve automation and reliability.
  • By 120 days, take ownership of main SRE plans, guide cross-team collaboration, and influence EA's approach to operational excellence.
  • Beyond 180 days, lead long-term strategies to improve reliability, mentor engineers, and champion sustainable and scalable engineering practices.

Main Responsibilities

  • Build scalable monitoring and observability systems using Prometheus/Grafana, Datadog, ELK, or similar.
  • Build infrastructure and tooling using technologies like Terraform, Ansible, AWS CloudFormation, and CI/CD pipelines (GitLab CI/CD).
  • Automate operational processes using Python and Bash to reduce manual toil and improve deployment reliability.
  • Operate and improve containerized applications using Kubernetes platforms (EKS, AKS, GKE).
  • Contribute to incident response processes and post-mortems, helping teams learn and improve from every incident.

What We're Looking For

  • Experience operating cloud platforms, especially AWS and Azure.
  • Expertise in monitoring, observability, and incident response at scale.
  • Hands-on experience with Infrastructure-as-Code and automation.
  • And desire to improve processes and team capabilities.
  • Comfortable working in dynamic environments and solving problems collaboratively.
  • 3+ years of experience building SRE practices from the ground up.
  • Led on-call rotations or reliability-focused projects.
  • Mentored junior engineers and influenced engineering culture through documentation and collaboration.

About Electronic Arts

We’re proud to have an extensive portfolio of games and experiences, locations around the world, and opportunities across EA. We value adaptability, resilience, creativity, and curiosity. From leadership that brings out your potential, to creating space for learning and experimenting, we empower you to do great work and pursue opportunities for growth.

We adopt a holistic approach to our benefits programs, emphasizing physical, emotional, financial, career, and community wellness to support a balanced life. Our packages are tailored to meet local needs and may include healthcare coverage, mental well-being support, retirement savings, paid time off, family leaves, complimentary games, and more. We nurture environments where our teams can always bring their best to what they do.

Electronic Arts is an equal opportunity employer. All employment decisions are made without regard to race, color, national origin, ancestry, sex, gender, gender identity or expression, sexual orientation, age, genetic information, religion, disability, medical condition, pregnancy, marital status, family status, veteran status, or any other characteristic protected by law. We will also consider employment qualified applicants with criminal records in accordance with applicable law. EA also makes workplace accommodations for qualified individuals with disabilities as required by applicable law.

Set alerts for more jobs like Site Reliability Engineer
Set alerts for new jobs by Electronic Arts
Set alerts for new Devops jobs in Romania
Set alerts for new jobs in Romania
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙