Lead site reliability engineer

22 Minutes ago • All levels

Job Summary

Job Description

The Site Reliability Engineer Lead will oversee support operations and site reliability engineering tasks, ensuring the effective functioning of systems and applications. Key responsibilities include managing a team, monitoring system performance, collaborating with cross-functional teams, developing incident response procedures, conducting audits, leading automation implementation, and providing technical guidance. This role focuses on enhancing system performance, availability, and resiliency. The candidate should have experience with monitoring tools, containerization technologies, and strong project management skills. This role may also be eligible for performance-based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need-based leave with no designated number of leave days per year); and 10 paid holidays per year.
Must have:
  • Proficiency in site reliability engineering (SRE) principles and practices.
  • Strong background in system administration, networking, and cloud computing.
  • Experience with monitoring tools such as Prometheus, Grafana, and ELK stack.
  • Knowledge of containerization technologies like Docker and Kubernetes.
  • Ability to troubleshoot complex technical issues and perform root cause analysis.
  • Excellent communication skills and ability to work collaboratively in a team environment.
  • Strong project management and leadership skills to drive initiatives efficiently.
Good to have:
  • Certifications in relevant areas such as AWS Certified DevOps Engineer or Google Professional Cloud DevOps Engineer are a plus.
Perks:
  • Medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance
  • Employee assistance program
  • 401(k) retirement plan
  • 10 days of paid time off per year
  • 10 paid holidays per year

Job Details

Job description:

About HCLTech
HCLTech is a global technology company, spread across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. We re powered by our people a global, diverse, multi-generational talent - representing 161 nationalities whose unique spark, perspective and boundless passion drive our culture of proactive value creation and problem-solving.
Our purpose is to bring together the best of technology and our people to supercharge progress for everyone, everywhere our clients, partners, their stakeholders, communities, and the planet. As a company, we are deeply focused on accelerating our ESG agenda. We are also creating technology-enabled sustainable solutions with and for our clients and partners. We embed ESG imperatives into every aspect of our business and ensure that the progress we supercharge is responsible, inclusive and beneficial to all our stakeholders in the long term. We have committed to achieving net zero by 2040.

To learn more about how we can supercharge progress for you, visit www.hcltech.com

Site Reliability Engineer Lead

Job Summary
The Support Lead (SRE) is responsible for overseeing the support operations and site reliability engineering tasks, ensuring the effective functioning of systems and applications. The primary goal is to enhance system performance, availability, and resiliency.

  • Key Responsibilities
    1. Manage a team of support engineers and sres to provide technical support and address system issues promptly.
    2. Monitor system performance and reliability metrics, identifying areas for improvement and implementing solutions.
    3. Collaborate with cross functional teams to optimize application performance and enhance system reliability.
    4. Develop and maintain incident response procedures and protocols to minimize system downtime.
    5. Conduct regular audits and assessments to ensure compliance with industry standards and best practices.
    6. Lead the implementation of automation tools and processes to streamline support operations and enhance efficiency.
    7. Provide technical expertise and guidance to team members, promoting a culture of continuous learning and development.

    Skill Requirements
    1. Proficiency in site reliability engineering (sre) principles and practices.
    2. Strong background in system administration, networking, and cloud computing.
    3. Experience with monitoring tools such as prometheus, grafana, and elk stack.
    4. Knowledge of containerization technologies like docker and kubernetes.
    5. Ability to troubleshoot complex technical issues and perform root cause analysis.
    6. Excellent communication skills and ability to work collaboratively in a team environment.
    7. Strong project management and leadership skills to drive initiatives and deliver results efficiently.
    8. Certifications in relevant areas such as aws certified devops engineer or google professional cloud devops engineer are a plus.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Illinois, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Minnesota, United States (On-Site)

California, United States (On-Site)

Colorado, United States (On-Site)

California, United States (On-Site)

Noida, Uttar Pradesh, India (On-Site)

Chennai, Tamil Nadu, India (On-Site)

Illinois, United States (On-Site)

Illinois, United States (On-Site)

California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by HCL Tech

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug