At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.
Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.
Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.
About the team and role:
We are the Reliability Engineering team, focus on eBay.com Availability, Reliability, Performance via Observability. We are seeking a highly motivated and experienced Reliability Engineering Manager to lead a team responsible for the development, operations, and governance of one of the largest observability control planes in the world based on Kubernetes.
You will guide and mentor a team of hardworking engineers in building and maintaining robust, secure, and scalable telemetry offerings that empower eBay’s core product engineering organizations and initiatives.
We require a unique blend of Reliability Engineering expertise, software development management experience, and strong leadership skills. The ideal candidate will have a deep understanding of Observability, experience managing and developing software engineering teams, passion for building scalable and reliable systems, and excellent people skills to cultivate a high-performing team.
What you will accomplish:
Leadership & People Management
- Lead and mentor a team of reliability engineers, fostering a strong culture of collaboration and continuous improvement.
- Conduct regular one-on-one meetings with team members, providing guidance, feedback, and support for their career development.
- Manage performance evaluations and provide constructive feedback and actively participate in all phases of growing the engineering organization through recruiting, team building, etc.
Reliability Engineering, Operations & Governance
- Lead and coordinate engineering activities to successfully plan, communicate, and deliver on product features on time while designing for quality, observability, and scalability.
- Ensure full software lifecycle instrumentation from requirement ideation to software development to deployment.
- Drive the adoption of cloud-native technologies and standard processes, such as containerization, service mesh, microservices, etc.
Collaboration with internal partners and team members:
- Reliability engineering and operations teams, product, and PMO on engineering resource allocation and project schedules in accordance with our strategic organizational priorities.
- SRE team to champion automation to enhance efficiency and reliability.
- Operations teams on maintaining a highly available telemetry and command/control infrastructure to ensure eBay’s products and services are available to our customers.
- Fleet management team on capacity planning, resource allocation, and cost optimization for the telemetry control plane.
- Information security teams to ensure integrity and compliance of the telemetry infrastructure by implementing appropriate security controls and monitoring.
What you will bring:
- 12-15 years of proven experience working in Infrastructure and software development and engineering organizations with 5 years’ experience in managing and leading both reliability engineering teams and software development teams.
- Excellent at communicating critical updates to organizational leaders and executives including AI-driven reliability trends and insights.
- Experience supporting medium or large tech organizations with many different internal customers and partners.
- Experience working collaboratively in large distributed global teams.
- Demonstrated ability to adopt and operationalize emerging AI tools, ensuring the team remains at the forefront of reliability engineering practices.
- Knowledge of software development, networking, security, and storage technologies in a cloud environment and proven understanding of cloud-native architectures, microservices, and DevOps and SRE principles.
- Passion for staying ahead of the curve in AI/ML innovation applied to observability, monitoring, and system reliability.