About the job
Job Summary:
- We are looking for an experienced Site Reliability Engineer (SRE) to join our growing team. This role demands a focus on enhancing the reliability, efficiency, and performance of our infrastructure and hosted applications. The role will collaborate with other SRE team members, applying your expertise in coding, algorithms, complexity analysis, and large-scale system design to tackle complex challenges. The ideal candidate will have hands-on experience in deploying and managing Kubernetes clusters, utilizing Grafana and Prometheus for monitoring, and supporting Microsoft SQL Server and VMware virtualization. Expertise in cloud deployments, Windows operating systems, and network technologies are a plus. Prior experience in DevOps, including CI/CD pipeline management and automation, is highly desirable. We emphasize self-directed, impactful work while providing the necessary support and mentorship for professional growth.
Responsibilities
Essential Responsibilities:
- Deploy, manage, and scale Kubernetes clusters to support containerized applications.
- Design, implement, and optimize continuous integration and continuous deployment (CI/CD) pipelines to streamline code deployment and enhance delivery efficiency.
- Implement and maintain monitoring solutions with Grafana and Prometheus to ensure system reliability.
- Drive efficiency by automating manual and repetitive tasks to reduce operational costs.
- Assist in managing and enhancing the entire production stack to deliver reliable systems for a diverse range of internal and external customers.
- Assist our internal Database teams to optimize and support Microsoft SQL Server databases, including performance tuning and maintenance.
- Assist our internal Virtualization/Server teams to ensure high availability and efficient resource use in our technology stack.
- Deploy and manage cloud resources on Azure or AWS, focusing on cost efficiency and performance optimization.
- Support and troubleshoot Windows operating systems and network technologies.
Required Skills:
- Demonstrated ability to automate processes and improve efficiency.
- Strong focus on managing and optimizing the full production stack for a wide user base.
- Proven experience as a DevOps Engineer, with a strong foundation in Kubernetes, Grafana, Prometheus, Microsoft SQL Server, and VMware
- Experience in designing, implementing, and managing CI/CD pipelines using tools like Jenkins, GitHub, or similar.
- Experience with Windows operating systems and network technologies
Qualifications
Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 4+ years of relevant experience in a Site Reliability Engineer or similar role, with a background in DevOps and CI/CD pipeline management preferred.