Site Reliability Engineer - Big Data

8 Hours ago • 5-7 Years • Data Analysis

Job Summary

Job Description

PhonePe is seeking a Site Reliability Engineer with 5 to 7 years of experience in Big Data to ensure the stability, scalability, and performance of distributed systems. Responsibilities include managing and automating the Hadoop ecosystem (HDFS, HBase, Hive, Airflow, YARN, Ranger, Kafka, Pinot, Druid), performing capacity planning, system tuning, and optimization. The role involves handling incidents, conducting root cause analysis, and implementing mitigation strategies. You will also be responsible for system updates, building observability tools, and participating in Kerberos and LDAP administration. The engineer will collaborate with various teams to ensure data availability and quality, applying system updates and patches.
Must have:
  • Ensure stability, scalability, and performance of Hadoop ecosystem
  • Manage Hadoop infrastructure (HDFS, HBase, Hive, etc.)
  • Automate operations via scripting
  • Perform capacity planning and system tuning
  • Configure and manage Nginx
  • Troubleshoot Linux and Big Data systems
  • Handle on-call responsibilities and incident management
  • Collaborate with infrastructure teams
  • Build tools for observability
  • Participate in Kerberos and LDAP administration
  • Experience with Linux system administration (min 1 year)
  • Hands-on Hadoop administration (over 4 years)
  • Proficient in scripting (Perl, Golang, or Python)
  • Strong operational knowledge of systems
  • Excellent communication skills
Good to have:
  • Design and maintain Airflow DAGs
  • ELK stack administration
  • Familiarity with monitoring tools (Grafana, Prometheus)
  • Exposure to security protocols (Kerberos, LDAP)
  • Familiarity with distributed systems (elasticsearch)
Perks:
  • Medical Insurance
  • Critical Illness Insurance
  • Accidental Insurance
  • Life Insurance
  • Employee Assistance Program
  • Onsite Medical Center
  • Emergency Support System
  • Maternity Benefit
  • Paternity Benefit Program
  • Adoption Assistance Program
  • Day-care Support Program
  • Relocation benefits
  • Transfer Support Policy
  • Travel Policy
  • Employee PF Contribution
  • Flexible PF Contribution
  • Gratuity
  • NPS
  • Leave Encashment
  • Higher Education Assistance
  • Car Lease
  • Salary Advance Policy

Job Details

About PhonePe Group: 

PhonePe is India’s leading digital payments company with 50 crore (500 Million) registered users and 3.7 crore (37 Million) merchants covering over 99% of the postal codes across India. On the back of its leadership in digital payments, PhonePe has expanded into financial services (Insurance, Mutual Funds, Stock Broking, and Lending) as well as adjacent tech-enabled businesses such as Pincode for hyperlocal shopping and Indus App Store which is India's first localized App Store. The PhonePe Group is a portfolio of businesses aligned with the company's vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture

At PhonePe, we take extra care to make sure you give your best at work, Everyday! And creating the right  environment for you is just one of the things we do. We empower people and trust them to do the right  thing. Here, you own your work from start to finish, right from day one. Being enthusiastic about tech is a  big part of being at PhonePe. If you like building technology that impacts millions, ideating with some of  the best minds in the country and executing on your dreams with purpose and speed, join us!

About the Role

As an SRE (5 to 7 years) (Big Data) Engineer at PhonePe, you will be responsible for ensuring the stability, scalability, and performance of distributed systems operating at scale. You will collaborate with development, infrastructure, and data teams to automate operations, reduce manual efforts, handle incidents, and continuously improve system reliability. This role requires strong problem-solving skills, operational ownership, and a proactive approach to mentoring and driving engineering excellence.

Roles and Responsibilities

  • Ensure the ongoing stability, scalability, and performance of PhonePe’s Hadoop ecosystem and associated services.
  • Manage and administer Hadoop infrastructure including HDFS, HBase, Hive, Pig, Airflow, YARN, Ranger, Kafka, Pinot, and Druid.
  • Automate BAU operations through scripting and tool development.
  • Perform capacity planning, system tuning, and performance optimization.
  • Set-up, configure, and manage Nginx in high-traffic environments.
  • Administration and troubleshooting of Linux + Bigdata systems, including networking (IP, Iptables, IPsec).
  • Handle on-call responsibilities, investigate incidents, perform root cause analysis, and implement mitigation strategies.
  • Collaborate with infrastructure, network, database, and BI teams to ensure data availability and quality.
  • Apply system updates, patches, and manage version upgrades in coordination with security teams.
  • Build tools and services to improve observability, debuggability, and supportability.
  • Participate in Kerberos and LDAP administration.
  • Experience in capacity planning and performance tuning of Hadoop clusters.
  • Work with configuration management and deployment tools like Puppet, Chef, Salt, or Ansible.

Skills Required

  • Minimum 1 year of Linux/Unix system administration experience.
  • Over 4 years of hands-on experience in Hadoop administration.
  • Minimum 1 years of experience managing infrastructure on public cloud platforms like AWS, Azure, or GCP (optional ) .
  • Strong understanding of networking, open-source tools, and IT operations.
  • Proficient in scripting and programming (Perl, Golang, or Python).
  • Hands-on experience with maintaining and managing the Hadoop ecosystem components like HDFS, Yarn, Hbase, Kafka .
  • Strong operational  knowledge in systems (CPU, memory, storage, OS-level troubleshooting).
  • Experience in administering and tuning relational and NoSQL databases.
  • Experience in configuring and managing Nginx in production environments.
  • Excellent communication and collaboration skills.

Good to Have

  • Experience designing and maintaining Airflow DAGs to automate scalable and efficient workflows.
  • Experience in ELK stack administration.
  • Familiarity with monitoring tools like Grafana, Loki, Prometheus, and OpenTSDB.
  • Exposure to security protocols and tools (Kerberos, LDAP).
  • Familiarity with distributed systems like elasticsearch or similar high-scale environments.

 

PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
  • Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
  • Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
  • Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
  • Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment 
  • Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy

Working at PhonePe is a rewarding experience! Great people, a work environment that thrives on creativity, the opportunity to take on roles beyond a defined job description are just some of the reasons you should work with us. Read more about PhonePe on our blog.

Life at PhonePe

PhonePe in the news

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Bengaluru, Karnataka, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Data Analysis Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike. With 48 crore (480 Million) registered users, one in four Indians are now on PhonePe. The company has also successfully digitized 3.6 crore (36 Million) offline merchants spread across Tier 2,3,4 and beyond, covering 99% of the postal codes across India. PhonePe is also the leader in Bharat Bill Pay System (BBPS), processing over 45% of the transactions on the BBPS platform. PhonePe forayed into financial services in 2017, providing users with safe and convenient investing options on its platform. Since then, the company has introduced several Mutual Funds and Insurance products that offer every Indian an equal opportunity to unlock the flow of money and access to services. PhonePe was recently recognized as the Most Trusted Brand for Digital Payments as per the Brand Trust Report 2023 by Trust Research Advisory (TRA).



Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Pune, Maharashtra, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by PhonePe

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug