Site Reliability Engineer

eBay

4+ Years | Bangalore, Karnataka, India (Hybrid) | Full Time | 2 months ago

Apply Now

Job Summary

At eBay, the Site Reliability Engineering (SRE) team bridges the gap between software development and operations. Our mission is to build systems, tools, and platforms that keep eBay services fast, available, and reliable—at global scale. We work closely with product engineering teams to design, build, and operate resilient applications that power the commerce experiences of millions. We’re looking for a Software Engineer with a passion for reliability, scalability, and performance—someone who brings both a developer’s mindset and a systems-thinking approach. This role involves proactive monitoring, solution development for high availability, collaborative problem-solving, enhancing monitoring tools, and incident management.

Must Have

4+ years of professional experience in software engineering, ideally in backend or platform teams
Proficiency in one or more programming languages (e.g., Java, Go, Python)
Strong incident management and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises.
Familiarity with cloud platforms, container orchestration (e.g., Kubernetes), and infrastructure-as-code tools
Experience with observability stacks (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments.

Job Description

At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.

Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.

Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.

About the team and the role:

We’re looking for a Software Engineer with a passion for reliability, scalability, and performance—someone who brings both a developer’s mindset and a systems-thinking approach.

What you will accomplish:

Proactive Monitoring: Continuously monitor the health of eBay's critical services to identify and address potential issues before they escalate.
Solution Development: Collaborate with Architecture, Engineering, and Operations teams to develop solutions that ensure high site availability, reliability and performance.
Collaborative Problem Solving: Work closely with partner teams to resolve recurring technical issues, onboard new alerts, and develop high-quality Standard Operating Procedures (SOPs).
Enhance Monitoring Tools: Build and improve tools for monitoring and mitigating site incidents, and conduct reliability audits and tests to strengthen eBay’s reliability and incident management capabilities.
Incident Management: Act as Incident Commander to drive resolution of major incidents, manage alarms, and ensure effective communication with leadership and partner teams.

What you will bring:

4+ years of professional experience in software engineering, ideally in backend or platform teams
Proficiency in one or more programming languages (e.g., Java, Go, Python)
Strong incident management and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises.
Familiarity with cloud platforms, container orchestration (e.g., Kubernetes), and infrastructure-as-code tools
Experience with observability stacks (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments.

10 Skills Required For This Role

Team Management Problem Solving Communication Game Texts Prometheus Grafana Elk Kubernetes Python Java

Similar Jobs