Staff Engineer (SRE + AWS)

Synechron

Job Summary

Synechron is seeking a Staff Engineer specializing in SRE and AWS to lead the design, deployment, and management of resilient, scalable, and secure enterprise systems. This role involves overseeing end-to-end system health, automating operational processes, and implementing best practices in cloud architecture and security. The engineer will drive innovation, ensure service continuity, and mentor teams to achieve high standards of operational excellence aligned with business and compliance requirements.

Must Have

  • Lead end-to-end management of enterprise systems for high availability, scalability, and security.
  • Architect and implement cloud solutions leveraging AWS services and best practices.
  • Drive automation initiatives to improve operational efficiency and incident response.
  • Manage system health, conduct proactive monitoring, capacity planning, and upgrades.
  • Oversee incident response, root cause analysis, and problem resolution.
  • Develop and implement security controls, vulnerability assessments, and compliance procedures.
  • Mentor technical teams on cloud technologies, SRE practices, and automation strategies.
  • Collaborate with business and technology teams to plan future system enhancements.
  • Maintain comprehensive documentation of architecture, configurations, and operational procedures.
  • Lead service continuity testing, penetration testing, and vulnerability management programs.
  • Extensive experience in cloud architecture and management with AWS.
  • Strong expertise in Site Reliability Engineering (SRE) principles.
  • Proficiency in scripting and automation using Python, Shell scripting.
  • Experience with infrastructure-as-code (IaC) tools such as CloudFormation, Terraform.
  • Knowledge of containerization (Docker) and orchestration (Kubernetes).
  • Familiarity with monitoring, logging, and alerting tools like CloudWatch, Prometheus, Grafana, ELK Stack, or Splunk.
  • Strong understanding of system security best practices, threat detection, vulnerability management, and compliance standards.

Good to Have

  • Support experience with automation/configuration management tools like Ansible, Chef, or Puppet.
  • Knowledge of CI/CD pipelines, Jenkins, GitLab CI, or Azure DevOps.
  • Experience with microservices architecture and cloud-native design patterns.
  • Familiarity with compliance standards such as ISO, SOC2, or GDPR.
  • Multi-cloud experience (Azure, GCP), serverless architectures, and advanced cloud security implementation.
  • Helm charts, service mesh tools like Istio, and advanced deployment strategies.
  • Experience with security audits, compliance frameworks, and encryption standards.
  • Industry domain background in finance, banking, or fintech.

Job Description

Job Summary

Synechron is seeking an experienced Staff Engineer specializing in Site Reliability Engineering (SRE) and Cloud Infrastructure (AWS) to lead the design, deployment, and management of resilient, scalable, and secure enterprise systems. You will oversee end-to-end system health, automate operational processes, and implement best practices in cloud architecture and security. Playing a pivotal role within our technical leadership team, you will drive innovation, ensure service continuity, and mentor teams to achieve high standards of operational excellence aligned with business and compliance requirements.

Software Requirements

Required Skills:

  • Extensive experience in cloud architecture and management with AWS (including services such as EC2, S3, RDS, Lambda, CloudFormation)
  • Strong expertise in Site Reliability Engineering (SRE) principles, including automation, observability, and incident management
  • Proficiency in scripting and automation using Python, Shell scripting, or similar tools
  • Experience with infrastructure-as-code (IaC) tools such as CloudFormation, Terraform, or similar
  • Knowledge of containerization (Docker) and orchestration (Kubernetes)
  • Familiarity with monitoring, logging, and alerting tools such as CloudWatch, Prometheus, Grafana, ELK Stack, or Splunk
  • Strong understanding of system security best practices, threat detection, vulnerability management, and compliance standards

Preferred Skills:

  • Support experience with automation/configuration management tools like Ansible, Chef, or Puppet
  • Knowledge of CI/CD pipelines, Jenkins, GitLab CI, or Azure DevOps
  • Experience with microservices architecture and cloud-native design patterns
  • Familiarity with compliance standards such as ISO, SOC2, or GDPR

Overall Responsibilities

  • Lead the end-to-end management of enterprise systems environments, ensuring high availability, scalability, and security
  • Architect and implement cloud-based solutions, leveraging AWS services and best practices in cloud security and cost optimization
  • Drive automation initiatives to improve operational efficiency, incident response, and system reliability
  • Manage system health, conduct proactive monitoring, and perform capacity planning and upgrades
  • Oversee incident response, root cause analysis, and problem resolution to ensure continuous service delivery
  • Develop and implement security controls, vulnerability assessments, and compliance procedures
  • Mentor and develop technical teams, sharing knowledge on cloud technologies, SRE practices, and automation strategies
  • Collaborate with business and technology teams to plan future system enhancements and migrations
  • Maintain comprehensive documentation of architecture, configurations, runbooks, and operational procedures
  • Lead service continuity testing, penetration testing, and vulnerability management programs to meet regulatory and security standards

Technical Skills (By Category)

Cloud Architecture & Services:

  • Required: Deep expertise in AWS core services (EC2, S3, RDS, Lambda, CloudFormation)
  • Preferred: Multi-cloud experience (Azure, GCP), serverless architectures, and advanced cloud security implementation

SRE & Automation:

  • Required: Automation of deployment, scaling, and incident response processes
  • Preferred: Monitoring with Prometheus, Grafana, ELK Stack, or Splunk; scripting using Python and Shell

Containerization & Orchestration:

  • Required: Docker containerization; Kubernetes for orchestration and managing microservices
  • Preferred: Helm charts, service mesh tools like Istio, and advanced deployment strategies

Security & Compliance:

  • Required: Implementation of security best practices, vulnerability management, and threat detection
  • Preferred: Experience with security audits, compliance frameworks, and encryption standards

Experience Requirements

  • 10-12 years of proven experience in cloud infrastructure, site reliability, or enterprise systems management
  • Extensive experience designing, deploying, and managing AWS cloud architectures at scale
  • Strong background in SRE principles, automation, and incident management
  • Demonstrated leadership in managing cross-functional teams and guiding best practices in cloud operations
  • Experience with compliance, security, and vulnerability management (ISO, SOC2, GDPR effective practices)
  • Industry domain background in finance, banking, or fintech is highly beneficial but not mandatory

Day-to-Day Activities

  • Architect, deploy, and manage cloud-based enterprise systems ensuring their high availability and resilience
  • Automate system provisioning, scaling, and incident responses to improve SLAs and reduce manual intervention
  • Monitor system health metrics, conduct capacity planning, and optimize resource utilization
  • Lead root cause analysis efforts, manage incident responses, and implement preventive measures
  • Conduct vulnerability assessments, coordinate penetration testing, and implement security controls
  • Collaborate with development teams to incorporate security and reliability best practices into deployment pipelines
  • Lead service continuity testing, disaster recovery planning, and compliance audits
  • Develop and maintain operational documentation, runbooks, and automation scripts
  • Mentor technical staff, promote a culture of continuous improvement, and share knowledge industry-wide

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
  • Certifications in AWS (e.g., AWS Solutions Architect, DevOps Engineer) and security standards (e.g., CISSP, CISA) are preferred
  • Extensive hands-on experience in cloud architecture, SRE practices, automation, and security for large enterprise systems

Professional Competencies

  • Strong analytical and problem-solving skills
  • Excellent leadership and mentorship abilities
  • Effective communication skills across technical and non-technical stakeholders
  • Strategic thinking with a focus on operational excellence, security, and cost efficiency
  • Adaptability to evolving technology stacks, industry standards, and regulatory requirements
  • Proactive and solution-oriented mindset, with a focus on continuous improvement

30 Skills Required For This Role

Team Management Cross Functional Communication Problem Solving Design Patterns Game Texts Ethical Hacking Gitlab Incident Response Aws Service Mesh Azure Prometheus Ansible Azure Devops Terraform Grafana Chef Elk Puppet Helm Cloud Security Ci Cd Docker Microservices Kubernetes Python Shell Splunk Jenkins

Similar Jobs