Site Reliability Engineer - Big Data (7 to 11 years)

PhonePe

7-11 Years | Bangalore, Karnataka, India (On Site) | Full Time | 2 months ago

Apply Now

Job Summary

This role is responsible for managing and maintaining complex, distributed big data ecosystems. It ensures the reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include automating processes, optimizing workflows, troubleshooting production issues, and driving system improvements across multiple business verticals. The role involves leading on-call rotations, designing automation, resolving production issues, and ensuring system availability and performance.

Must Have

Manage, maintain, and support incremental changes to Linux/Unix environments.
Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes.
Design and implement automation systems for managing big data infrastructure.
Troubleshoot and resolve complex production issues while identifying root causes.
Design and review scalable and reliable system architectures.
Collaborate with teams to optimize overall system performance.
Enforce security standards across systems and infrastructure.
Set technical direction, drive standardization, and operate independently.
Ensure availability, performance, and scalability of systems and services.
Resolve, analyze, and respond to system outages and disruptions.
Develop tools and scripts to automate operational processes.
Monitor and optimize system performance and resource usage.
Collaborate with development teams to integrate best practices for reliability, scalability, and performance.
Stay informed of industry technology trends and innovations.
Develop and enforce SRE best practices and principles.
Align across functional teams on priorities and deliverables.
Drive automation to enhance operational efficiency.
Over 7 years of experience managing and maintaining distributed big data ecosystems.
Strong expertise in Linux including IP, Iptables, and IPsec.
Proficiency in scripting/programming with languages like Perl, Golang, or Python.
Hands-on experience with the Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot).
Familiarity with open-source configuration management and deployment tools such as Puppet, Salt, Chef, or Ansible.
Solid understanding of networking, open-source technologies, and related tools.
Experience with DevOps tools: Saltstack, Ansible, docker, Git.
Experience with SRE Logging and monitoring tools: ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry.

Good to Have

Experience managing infrastructure on public cloud platforms (AWS, Azure, GCP).
Experience in designing and reviewing system architectures for scalability and reliability.
Experience with observability tools to visualize and alert on system performance.

Perks & Benefits

Medical Insurance
Critical Illness Insurance
Accidental Insurance
Life Insurance
Employee Assistance Program
Onsite Medical Center
Emergency Support System
Maternity Benefit
Paternity Benefit Program
Adoption Assistance Program
Day-care Support Program
Relocation benefits
Transfer Support Policy
Travel Policy
Employee PF Contribution
Flexible PF Contribution
Gratuity
NPS
Leave Encashment
Higher Education Assistance
Car Lease
Salary Advance Policy

Job Description

About PhonePe Limited:

Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.

PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture:

At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us!

About the Role:

Roles and Responsibilities:

Manage, maintain, and support incremental changes to Linux/Unix environments.
Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes.
Design and implement automation systems for managing big data infrastructure, including provisioning, scaling, upgrades, and patching clusters.
Troubleshoot and resolve complex production issues while identifying root causes and implementing mitigating strategies.
Design and review scalable and reliable system architectures.
Collaborate with teams to optimize overall system performance.
Enforce security standards across systems and infrastructure.
Set technical direction, drive standardization, and operate independently.
Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
Resolve, analyze, and respond to system outages and disruptions and implement measures to prevent similar incidents from recurring.
Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency and improving system resilience.
Monitor and optimize system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle.
Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities.
Develop and enforce SRE best practices and principles.
Align across functional teams on priorities and deliverables.
Drive automation to enhance operational efficiency.

Skills Required:

Over 7 years of experience managing and maintaining distributed big data ecosystems.
Strong expertise in Linux including IP, Iptables, and IPsec.
Proficiency in scripting/programming with languages like Perl, Golang, or Python.
Hands-on experience with the Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot).
Familiarity with open-source configuration management and deployment tools such as Puppet, Salt, Chef, or Ansible.
Solid understanding of networking, open-source technologies, and related tools.
Excellent communication and collaboration skills.
DevOps tools: Saltstack, Ansible, docker, Git.
SRE Logging and monitoring tools: ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry.

Good to Have:

Experience managing infrastructure on public cloud platforms (AWS, Azure, GCP).
Experience in designing and reviewing system architectures for scalability and reliability.
Experience with observability tools to visualize and alert on system performance.

PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy

Our inclusive culture promotes individual expression, creativity, innovation, and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity, ideas and debates, where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender, sexual preference, religion, race, color or disability. If you have a disability or special need that requires assistance or reasonable accommodation, during the application and hiring process, including support for the interview or onboarding process, please fill out this form.

Read more about PhonePe on our blog._

Life at PhonePe

PhonePe in the news

24 Skills Required For This Role

Communication Problem Solving Data Analytics Github Talent Acquisition Game Texts Networking Hbase Linux Aws Unix Azure Prometheus Ansible Grafana Chef Elk Puppet Hadoop Yarn Docker Git Python Perl

Similar Jobs

Devops

Engineer, Site Reliability Engineering

London stock Exchange • Bangalore, Karnataka, India (On Site)

4 days ago

DevOps Engineer

Kforce Inc • Greenwood Village, Colorado, United States (On Site)

4 days ago

Global Endpoint DevOps Engineer

GLu Mobile • Vancouver, British Columbia, Canada (On Site)

4 days ago

Shift engineer (SRE Team)

Gaijin Entertainment • On Site

4 days ago

Senior Solutions Architect - New Logo

Temporal Technologies • United States (Remote)

4 days ago

Cloud Infrastructure Engineer

Pay2 • Gurugram, India (On Site)

4 days ago

Cloud Engineer

Universal Music Group • Nashville, Tennessee, United States (On Site)

4 days ago

Sr Cloud Engineer

King • Stockholm, Sweden (On Site)

4 days ago

site reliability engineer - core and data

Cred • Bangalore, Karnataka, India (On Site)

5 days ago

Senior Site Reliability Engineer

Progress • Provincia de Heredia, Belén, Costa Rica (Hybrid)

5 days ago

Software Development & Engineering

Software Engineer I

Motive Studio • Hyderabad, Telangana, India (Hybrid)

3 days ago

Software Engineer II

Motive Studio • Hyderabad, Telangana, India (Hybrid)

3 days ago

Salesforce Senior Developer

Ness • Bangalore, Karnataka, India (Hybrid)

3 days ago

Engineering Manager, Create:Source Code

gitlab • Remote

3 days ago

Process Engineer III

Applied materials • Xi'An, Shaanxi, China (On Site)

3 days ago

200mm Lab-Etch Engineer

Applied materials • Xi'An, Shaanxi, China (On Site)

3 days ago

PDC(PROVision) Process Support Engineer

Applied materials • Icheon, Gyeonggi-do, South Korea (On Site)

3 days ago

Coastal Engineer - INTERNAL ONLY

TSA • Bundall Queensland, Australia (On Site)

3 days ago

Software Engineer, BigQuery AI Developer Experience

Google • Kirkland, Washington, United States of America (On Site)

3 days ago

Senior Programmer

big ant • Melbourne VIC, Australia (On Site)

3 days ago

View All Jobs

PhonePe

28 Active Jobs

Service Delivery Engineer, SRE

Bangalore, Karnataka, India (On Site) 1 weeks ago

Software Engineer - Android

Bangalore, Karnataka, India (On Site) 1 weeks ago

Site Reliability Engineer - Systems (7-10 years)

On Site 1 weeks ago

Program Manager - Data Center (8-13yrs)

Bangalore, Karnataka, India (On Site) 3 weeks ago

Site Reliability Engineer (1-3yrs)

Bangalore, Karnataka, India (On Site) 3 weeks ago

AM, Ethics

Bangalore, Karnataka, India (On Site) 3 weeks ago

Specialist - HR Compliance

Bangalore, Karnataka, India (On Site) 3 weeks ago

Area Collections Manager, Mumbai

Mumbai, Maharashtra, India (On Site) 3 weeks ago

Senior Manager - Category (Insurance)

Bangalore, Karnataka, India (On Site) 3 weeks ago

Software Engineer (8-12 years)

Bangalore, Karnataka, India (On Site) 1 months ago

View All Jobs

Free Game Dev Courses

Built by game devs, for game devs. Learn in 15-minute lessons. From AI workflow to iconic game mechanics - level up your skills with browser-based learning. Zero setup required.

Start Learning Now