Manager Site Reliability Engineer

9 Months ago • 10-15 Years • Devops

Job Summary

Job Description

Zeta is a Next-Gen Banking Tech company empowering banks and fintechs. This role involves bridging the gap between development and operations to build and maintain reliable, scalable, and efficient systems, ensuring a seamless user experience. Responsibilities include system reliability, automation of operational tasks, incident response, capacity planning, performance optimization, implementing Infrastructure as Code, monitoring, logging, security best practices, disaster recovery planning, continuous improvement, team leadership, and mentorship.
Must have:
  • Proficiency in Python, Go, Shell, or Bash
  • Strong automation skills (Ansible, Puppet, Chef)
  • Experience with Docker and Kubernetes
  • Proficiency in AWS, Azure, or GCP
  • Familiarity with monitoring tools (Prometheus, Grafana)
  • Understanding of networking concepts
  • Knowledge of security best practices
  • Understanding of CI/CD pipelines
  • Experience with Git
  • 10-15 years of SRE experience
  • B.Tech/M.Tech in computer science or related field
  • Ability to lead and mentor a team
Good to have:
  • Experience working for a product organization
  • Cloud certifications (AWS, GCP, Azure)

Job Details

About Zeta

Zeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by Bhavin Turakhia and Ramki Gaddipati in 2015.
Our flagship processing platform - Zeta Tachyon - is the industry’s first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.
Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.
Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US, EMEA, and Asia. We raised $280 million at a $1.5 billion valuation from Softbank, Mastercard, and other investors in 2021.

About the Role:
The role of an Site Reliability Engineer is to bridge the gap between development and operations, focusing on building and maintaining reliable, scalable, and efficient systems. The ultimate goal is to ensure a seamless and reliable user experience while promoting a culture of automation, collaboration, and continuous improvement within the organization.

Responsibilities:
  • System Reliability: Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
  • Automation: Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
  • Incident Response and Resolution: Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
  • Capacity Planning: Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
  • Performance Optimization: Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
  • Infrastructure as Code (IaC): Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
  • Monitoring and Logging: Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
  • Security: Collaborating with security teams to implement and maintain security best practices in infrastructure and application
  • Disaster Recovery Planning: Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
  • Continuous Improvement: Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.
  • Team Leadership: Ability to lead and motivate a team of SREs, providing guidance and support.
  • Mentorship and Coaching: Providing mentorship and coaching to team members to foster their professional development.
  • Conflict Resolution: Skill in resolving conflicts and addressing challenges within the team

Skills:
  • Programming Languages: Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
  • Automation and Scripting: Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
  • Containerization and Orchestration: Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
  • Cloud Computing: Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
  • Monitoring and Logging: Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
  • Networking: Understanding of networking concepts, protocols, and troubleshooting skills.
  • Security: Knowledge of security best practices, including encryption, access controls, and vulnerability management.
  • Continuous Integration/Continuous Deployment (CI/CD): Understanding and implementation of CI/CD pipelines for automated testing and deployment.
  • Load Balancing: Experience in incident response, troubleshooting, and resolution.
  • Version Control: Proficient use of version control systems like Git.

Experience & Qualifications:
  • 10 - 15 years of experience in site reliability engineering.
  • B.Tech/M.Tech in computer science, information technology or a related field.
  • Having experience working for a product organization is a plus.
  • Certifications from cloud service providers like AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or Microsoft Certified is a plus

Zeta is an equal opportunity employer.  
At Zeta, we are committed to equal employment opportunities regardless of job history, disability, gender identity, religion, race, marital/parental status, or another special status. We are proud to be an equitable workplace that welcomes individuals from all walks of life if they fit the roles and responsibilities.

Similar Jobs

Rocket - Network Engineer - Voice

Rocket

Pune, Maharashtra, India (On-Site)
1 Month ago
bytedance - Employee Relations Advisor - APAC

bytedance

Tokyo, Japan (On-Site)
9 Months ago
Western Digital - Senior Engineer, Manufacturing Equipment Engineering

Western Digital

Bayan Lepas, Penang, Malaysia (On-Site)
1 Month ago
Zscaler - Senior Network Engineer

Zscaler

Japan (Remote)
3 Months ago
Dream Games - Senior Customer Support Specialist

Dream Games

İstanbul, Türkiye (On-Site)
1 Year ago
Autodesk - Principal Engineer - AWS - OpenSearch/BedRock

Autodesk

Bengaluru, Karnataka, India (On-Site)
2 Months ago
bytedance - Software Engineer, AI Infrastructure

bytedance

Singapore (On-Site)
3 Weeks ago
London stock Exchange - Tech Lead -Database SRE

London stock Exchange

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Scopely - Senior Server Engineer (Platform)

Scopely

Lisbon, Lisbon, Portugal (Hybrid)
5 Months ago
BigID - DevOps Engineer

BigID

Tel Aviv-Yafo, Tel Aviv District, Israel (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Motorola solutions - Senior Linux Full Stack Engineer

Motorola solutions

Culver City, California, United States (On-Site)
1 Month ago
Fancandy - Web/Mobile Game Developer - Coding Ninja

Fancandy

(Remote)
3 Months ago
Qualcomm - Android QNX Performance Engineer (CE)

Qualcomm

Bengaluru, Karnataka, India (On-Site)
3 Months ago
NCR Atleos - Senior SW Quality Engineer

NCR Atleos

Hyderabad, Telangana, India (On-Site)
2 Months ago
Nice - Specialist Cloud Devops Engineer

Nice

Pune, Maharashtra, India (Hybrid)
2 Months ago
nekki game - Middle/Senior Core Game Designer

nekki game

(Remote)
3 Months ago
binance - Software Engineer - Blockchain Security

binance

Taipei City, Taiwan (Remote)
1 Year ago
DraftKings - Senior Software Engineer, Sportsbook

DraftKings

Plovdiv, Plovdiv Province, Bulgaria (Remote)
3 Months ago
Backbone - Senior Consumer Insights Marketing Researcher

Backbone

Atherton, California, United States (Hybrid)
1 Year ago
Scanline VFX - Environment Supervisor

Scanline VFX

Vancouver, British Columbia, Canada (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Luxoft - iOS Engineer

Luxoft

Gurugram, Haryana, India (On-Site)
8 Months ago
Rolls-Royce - Compressor, Turbines Design Engineer

Rolls-Royce

Bengaluru, Karnataka, India (On-Site)
2 Months ago
extreme network - PRINCIPAL SW SYSTEMS ENGINEER 9850- CloudOps/DevOps- Linux-Kubernetes-AWS/Azure

extreme network

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
Capgemini - C++ Developer with Python

Capgemini

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Paytm - Key Account Manager - Team Lead/ Assistant Manager - Enterprise Mid Market

Paytm

Mumbai, Maharashtra, India (On-Site)
2 Months ago
Salesforce - Customer Success Manager (Salesforce Commerce Cloud / Demandware exp mandatory)

Salesforce

Hyderabad, Telangana, India (On-Site)
1 Month ago
Paytm - Growth Management - General Manager - Offline Merchants

Paytm

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Hitachi - Python + React

Hitachi

Pune, Maharashtra, India (On-Site)
10 Months ago
Bazaar Voice - Gainsight Administrator

Bazaar Voice

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Oliver Agency - Social Media Content Creator

Oliver Agency

India (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Monarch Money - Senior Software Engineer, Database Infrastructure

Monarch Money

United States (Remote)
3 Months ago
Lambda - Senior Site Reliability Engineer - Fleet Reliability

Lambda

San Francisco, California, United States (Hybrid)
4 Months ago
NBC Universal - Manager, Site Reliability Engineer (ServiceNow)

NBC Universal

Englewood Cliffs, New Jersey, United States (Remote)
3 Months ago
Unity - Senior Site Reliability Engineer

Unity

Bellevue, Washington, United States (On-Site)
2 Months ago
AeroSpike - Senior Software Engineer, Cloud

AeroSpike

United States (Remote)
3 Months ago
broadcom - AI Platform Engineer

broadcom

Austin, Texas, United States (On-Site)
1 Month ago
Rackspace Technology - AWS Devops III

Rackspace Technology

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
Adyen - Solutions Architect

Adyen

Warsaw, Masovian Voivodeship, Poland (On-Site)
3 Months ago
Perplexity - AI Software Engineer - Evaluation Platform

Perplexity

San Francisco, California, United States (On-Site)
3 Months ago
Google - Software Engineering Manager II, Site Reliability Engineering

Google

Sunnyvale, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Bengaluru, Karnataka, India (On-Site)

Hyderabad, Telangana, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Hyderabad, Telangana, India (On-Site)

Hyderabad, Telangana, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by zeta