We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic Cloud Infrastructure team at eBay in Dublin, Ireland. This role demands a deep understanding of cloud-native technologies, particularly containers and Kubernetes, along with strong programming skills in languages such as Go and Python. The ideal candidate will have a proven track record of at least 3 years in the field, focusing on enhancing the reliability, scalability, design, development, deployment, and operation of self-service platforms that facilitate the lifecycle management of applications supporting eBay's products and services.
Responsibilities
- Collaborate with internal customers and partners to deliver key business outcomes.
- Ensure that cloud products are reliable, scalable, efficient, and compliant with eBay's security and operational standards.
- Enhance observability practices to ensure comprehensive monitoring and alerting across cloud services.
- Respond to cloud incidents, perform root cause analysis, and implement corrective actions to prevent future occurrences. Develop and maintain incident response plans.
- Analyze system performance metrics and make recommendations for improvements. Implement changes to optimize resource utilization and improve application performance.
- Drive improvements in CI/CD processes to increase deployment velocity and reliability.
- Develop and maintain automation to streamline operations, reduce manual work, and enhance system reliability.
Requirements
- Minimum of 3+ years of programming experience with Go or Python.
- 5+ years of experience in implementing large-scale, distributed, high-availability, fault-tolerant systems and infrastructure in a production environment.
- Proficiency in delivering products within a multi-functional team environment.
- Demonstrated expertise in observability tools and practices, ensuring system reliability and performance.
- Extensive experience with Kubernetes as an SRE, or related cloud infrastructure and cloud-native technologies. Experience in developing with Kubernetes and/or building Kubernetes controllers is highly desirable.
- Deep understanding of API design and RESTful principles, with experience in building web services at scale.
Preferred Skills:
- Certifications in Kubernetes, lifecycle management or related fields.
- Understanding application lifecycle management, CI/CD is a plus.
- Experience in a high-traffic, large-scale environment.
- Familiarity with additional programming languages or frameworks.
- Proficiency in Agile development methodologies.
- Experience in participating in open-source standards and contributing to open-source projects is a plus.