Lead SRE/DevOps Engineer

Synechron

7+ Years | Pittsburgh, PA, United States (On Site) | Full Time | 1 day ago

Apply Now

Job Summary

At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. We are seeking a highly skilled Lead Site Reliability Engineer (SRE) / DevOps Engineer to drive the reliability, observability, and operational excellence of our platforms. This role will lead major initiatives around monitoring, automation, incident response, and performance optimization leveraging enterprise tools such as Dynatrace, BigPanda, and LogScale/MonPro. Candidate will partner closely with engineering, operations, and product teams to build robust systems, improve service availability, and ensure a seamless user experience through proactive observability and best-in-class SRE practices.

Must Have

Implement and enhance proactive observability frameworks.
Optimize experience monitoring and user interaction metrics.
Manage and improve the event catalog.
Build and maintain dashboards, alerts, and health reporting using Dynatrace, BigPanda, MonPro, and LogScale.
Perform service tuning to improve system performance.
Establish and maintain observability standards and best practices.
Conduct chaos testing and resilience validation.
Lead anomaly detection practices.
Ensure platform stability, performance, and reliability through proven reliability engineering principles.
Drive SRE initiatives, including continuous improvement projects.
Develop, maintain, and scale automated orchestration pipelines.
Create, maintain, and enforce SRE standards, including SLIs, SLOs, and operational playbooks.
Lead and conduct root cause analysis for critical incidents.
Own the problem management lifecycle.
Collaborate with cross-functional teams to address systemic issues.
7+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
Hands-on expertise with observability/monitoring tools such as Dynatrace, BigPanda, LogScale / MonPro / LogicMonitor.
Solid experience with cloud platforms (AWS, Azure, or GCP).
Strong proficiency in automation & orchestration (Terraform, Ansible, Jenkins, GitHub Actions, etc.).
Proven track record in incident management, RCA, and implementing reliable SRE practices.
Experience with CI/CD pipelines, infrastructure as code, and configuration management.
Deep understanding of Linux systems, networking fundamentals, and distributed system design.
Strong scripting abilities (Python, Bash, PowerShell, or equivalent).
Excellent communication, leadership, and cross-team collaboration skills.

Good to Have

Experience leading SRE or DevOps teams.
Knowledge of chaos engineering, advanced anomaly detection, and proactive alerting strategies.
Experience implementing SLI/SLO frameworks and performance optimization programs.
Familiarity with containerization (Docker, Kubernetes) and service meshes.

Perks & Benefits

A highly competitive compensation and benefits package.
A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
10 days of paid annual leave (plus sick leave and national holidays).
Maternity & paternity leave plans.
A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability.
Retirement savings plans.
A higher education certification policy.
Commuter benefits.
Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
A flat and approachable organization.
A truly diverse, fun-loving, and global work culture.

Job Description

We are

At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.

Our challenge

We are seeking a highly skilled Lead Site Reliability Engineer (SRE) / DevOps Engineer to drive the reliability, observability, and operational excellence of our platforms. This role will lead major initiatives around monitoring, automation, incident response, and performance optimization leveraging enterprise tools such as Dynatrace, BigPanda, and LogScale/MonPro. Candidate will partner closely with engineering, operations, and product teams to build robust systems, improve service availability, and ensure a seamless user experience through proactive observability and best-in-class SRE practices.

Additional Information*

The base salary for this position will vary based on geography and other factors. In accordance with law, the base salary for this role if filled within Pittsburgh, PA/Dallas, TX is $125k - $135k/year & benefits (see below).

The Role

Responsibilities:

Observability & Monitoring

Implement and enhance proactive observability frameworks to anticipate and mitigate issues before they occur.
Optimize experience monitoring and user interaction metrics across applications and services.
Manage and improve the event catalog, ensuring all system events are structured and actionable.
Build and maintain dashboards, alerts, and health reporting using tools like Dynatrace, BigPanda, MonPro, and LogScale.
Perform service tuning to improve system performance based on real-time metrics and data analysis.
Establish and maintain observability standards and best practices across teams.
Conduct chaos testing and resilience validation to ensure high system availability.
Lead anomaly detection practices to quickly identify and respond to unusual system behavior.

SRE Practices

Ensure platform stability, performance, and reliability through proven reliability engineering principles.
Drive SRE initiatives, including continuous improvement projects within the Site Reliability Center.
Develop, maintain, and scale automated orchestration pipelines to streamline operations and improve efficiency.
Create, maintain, and enforce SRE standards, including SLIs, SLOs, and operational playbooks.
Lead and conduct root cause analysis for critical incidents and drive long-term remediation improvements.

Problem Management

Own the problem management lifecycle—identifying, tracking, and resolving underlying issues to prevent recurring incidents.
Collaborate with cross-functional teams to address systemic issues and drive operational resilience.

Requirements:

7+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
Hands-on expertise with observability/monitoring tools such as:
Dynatrace (APM, RUM, dashboards, alerting)
BigPanda (event correlation, incident response)
LogScale / MonPro / LogicMonitor or similar log and metrics platforms
Solid experience with cloud platforms (AWS, Azure, or GCP).
Strong proficiency in automation & orchestration (Terraform, Ansible, Jenkins, GitHub Actions, etc.).
Proven track record in incident management, RCA, and implementing reliable SRE practices.
Experience with CI/CD pipelines, infrastructure as code, and configuration management.
Deep understanding of Linux systems, networking fundamentals, and distributed system design.
Strong scripting abilities (Python, Bash, PowerShell, or equivalent).
Excellent communication, leadership, and cross-team collaboration skills.

Preferred, but not required:

Experience leading SRE or DevOps teams.
Knowledge of chaos engineering, advanced anomaly detection, and proactive alerting strategies.
Experience implementing SLI/SLO frameworks and performance optimization programs.
Familiarity with containerization (Docker, Kubernetes) and service meshes.

We offer:

A highly competitive compensation and benefits package.
A multinational organization with 58 offices in 21 countries and the possibility to work abroad.
10 days of paid annual leave (plus sick leave and national holidays).
Maternity & paternity leave plans.
A comprehensive insurance plan including medical, dental, vision, life insurance, and long-/short-term disability (plans vary by region).
Retirement savings plans.
A higher education certification policy.
Commuter benefits (varies by region).
Extensive training opportunities, focused on skills, substantive knowledge, and personal development.
On-demand Udemy for Business for all Synechron employees with free access to more than 5000 curated courses.
Coaching opportunities with experienced colleagues from our Financial Innovation Labs (FinLabs) and Center of Excellences (CoE) groups.
Cutting edge projects at the world’s leading tier-one banks, financial institutions and insurance firms.
A flat and approachable organization.
A truly diverse, fun-loving, and global work culture.

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

About Us

At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more.

Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.

For more information on the company, please visit our website or LinkedIn community.

Sustainability and Health Safety Commitment

At Synechron, we are committed to integrating sustainability into our business strategy, ensuring responsible growth while minimizing environmental impact. Employees play a key role in driving our sustainability initiatives, from reducing our carbon footprint to fostering ethical and sustainable business practices across global operations. All positions are required to adhere to our Sustainability and Health Safety standards, demonstrating a commitment to environmental stewardship, workplace safety, and sustainable practices.

Let's Talk

Not finding the right fit? Let us know you're interested in a future opportunity by clicking Get Started below or create an account by clicking 'Sign In' at the top of the page to set up email alerts as new job postings become available that meet your interest!

Get Started

25 Skills Required For This Role

Team Management Cross Functional Communication Business Strategy Data Analytics Github Game Texts User Experience Ux Networking Linux Incident Response Aws Azure Ansible Terraform Powershell Data Science Ci Cd Docker Kubernetes Python Github Actions Bash Jenkins System Design

Similar Jobs

Devops

Senior Site Reliability Engineer

Never forget games • São Paulo, São Paulo - State of São Paulo, Brazil (Remote)