Site Reliability Engineer

14 Minutes ago • 4 Years +
Devops

Job Description

At eBay, the Site Reliability Engineering (SRE) team bridges the gap between software development and operations. Our mission is to build systems, tools, and platforms that keep eBay services fast, available, and reliable—at global scale. We work closely with product engineering teams to design, build, and operate resilient applications that power the commerce experiences of millions. We’re looking for a Software Engineer with a passion for reliability, scalability, and performance—someone who brings both a developer’s mindset and a systems-thinking approach. This role involves proactive monitoring, solution development for high availability, collaborative problem-solving, enhancing monitoring tools, and incident management.
Must Have:
  • 4+ years of professional experience in software engineering, ideally in backend or platform teams
  • Proficiency in one or more programming languages (e.g., Java, Go, Python)
  • Strong incident management and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises.
  • Familiarity with cloud platforms, container orchestration (e.g., Kubernetes), and infrastructure-as-code tools
  • Experience with observability stacks (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
  • Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments.

Add these skills to join the top 1% applicants for this job

team-management
problem-solving
communication
game-texts
prometheus
grafana
elk
kubernetes
python
java

At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.

Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.

Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.

About the team and the role:

At eBay, the Site Reliability Engineering (SRE) team bridges the gap between software development and operations. Our mission is to build systems, tools, and platforms that keep eBay services fast, available, and reliable—at global scale. We work closely with product engineering teams to design, build, and operate resilient applications that power the commerce experiences of millions.

We’re looking for a Software Engineer with a passion for reliability, scalability, and performance—someone who brings both a developer’s mindset and a systems-thinking approach.

What you will accomplish:

  • Proactive Monitoring: Continuously monitor the health of eBay's critical services to identify and address potential issues before they escalate.
  • Solution Development: Collaborate with Architecture, Engineering, and Operations teams to develop solutions that ensure high site availability, reliability and performance.
  • Collaborative Problem Solving: Work closely with partner teams to resolve recurring technical issues, onboard new alerts, and develop high-quality Standard Operating Procedures (SOPs).
  • Enhance Monitoring Tools: Build and improve tools for monitoring and mitigating site incidents, and conduct reliability audits and tests to strengthen eBay’s reliability and incident management capabilities.
  • Incident Management: Act as Incident Commander to drive resolution of major incidents, manage alarms, and ensure effective communication with leadership and partner teams.

What you will bring:

  • 4+ years of professional experience in software engineering, ideally in backend or platform teams
  • Proficiency in one or more programming languages (e.g., Java, Go, Python)
  • Strong incident management and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises.
  • Familiarity with cloud platforms, container orchestration (e.g., Kubernetes), and infrastructure-as-code tools
  • Experience with observability stacks (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
  • Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments.

Set alerts for more jobs like Site Reliability Engineer
Set alerts for new jobs by eBay
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙