SRE - 2 (Big Data)

2 Months ago • 3-5 Years • Devops

Job Summary

Job Description

As a Site Reliability Engineer (SRE) specializing in DataPlatform OnPremise, you will be responsible for ensuring the reliability, scalability, and performance of the Cloudera Data Platform (CDP) infrastructure. You will collaborate with cross-functional teams to design, implement, and maintain robust systems supporting data-driven initiatives. Your responsibilities include managing the Cloudera-based infrastructure, ensuring optimal performance, high availability, and scalability, troubleshooting system issues, creating runbooks and automating them, implementing data security best practices, optimizing infrastructure for performance, planning capacity, collaborating with various teams, implementing backup and disaster recovery strategies, developing tools for debugging, applying patches and upgrades, documenting configurations, processes, and procedures, and communicating project updates. You will work to ensure the smooth functioning, operation, performance, and security of large, high-density, Cloudera-based infrastructure. This role involves a pivotal contribution to the data platform's operational success.
Must have:
  • Proficiency in Linux system administration, shell scripting, and networking.
  • 3-5 years of experience in managing large-scale Hadoop clusters.
  • Experience with Hadoop ecosystem technologies (HDFS, MapReduce, YARN, Hive, Spark, etc.).
  • Strong scripting skills (e.g., PERL, Python, Bash) for automation and troubleshooting.
  • Excellent communication skills and ability to collaborate effectively.
Good to have:
  • Cloudera Certified Administrator (CCA) or Cloudera Certified Professional (CCP) certification preferred.
  • Experience in managing medium/large Hadoop-based environments.
  • Familiarity with Open Data Lake components such as Ozone, Iceberg, Spark, Flink, etc.
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes, OpenShift).
Perks:
  • Medical, Critical Illness, Accidental, and Life Insurance.
  • Employee Assistance Program, Onsite Medical Center, Emergency Support System.
  • Maternity, Paternity, Adoption Assistance, and Day-care Support Programs.
  • Relocation benefits, Transfer Support Policy, and Travel Policy.
  • Employee PF Contribution, Gratuity, NPS, and Leave Encashment.
  • Higher Education Assistance, Car Lease, and Salary Advance Policy.

Job Details

About PhonePe Group: 

PhonePe is India’s leading digital payments company with 50 crore (500 Million) registered users and 3.7 crore (37 Million) merchants covering over 99% of the postal codes across India. On the back of its leadership in digital payments, PhonePe has expanded into financial services (Insurance, Mutual Funds, Stock Broking, and Lending) as well as adjacent tech-enabled businesses such as Pincode for hyperlocal shopping and Indus App Store which is India's first localized App Store. The PhonePe Group is a portfolio of businesses aligned with the company's vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture

At PhonePe, we take extra care to make sure you give your best at work, Everyday! And creating the right  environment for you is just one of the things we do. We empower people and trust them to do the right  thing. Here, you own your work from start to finish, right from day one. Being enthusiastic about tech is a  big part of being at PhonePe. If you like building technology that impacts millions, ideating with some of  the best minds in the country and executing on your dreams with purpose and speed, join us!

Job Overview:

As a Site Reliability Engineer (SRE) specializing in DataPlatform OnPremise, you will play a critical role in deployment, ensuring the reliability, scalability, and performance of our Cloudera Data Platform (CDP) infrastructure. You will collaborate closely with cross-functional teams to design, implement, and maintain robust systems that support our data-driven initiatives. The ideal candidate will have a deep understanding of Data Platform, strong troubleshooting skills, and a proactive mindset towards automation and optimization.You will play a pivotal role in ensuring the smooth functioning, operation, performance and security of large high density Cloudera-based infrastructure.

 

Roles and Responsibilities:

  1. Work on tasks related to implementation of Cloudera Data Platform Cloudera Data Platform on-premises and be a part of planning, installation, configuration, and integration with existing systems.
  2. Infrastructure Management: Manage and maintain the Cloudera-based infrastructure, ensuring optimal performance, high availability, and scalability. This includes monitoring system health, and performing routine maintenance tasks.
  3. Strong troubleshooting skills and operational expertise in areas such as system capacity, bottlenecks, memory, CPU, OS, storage, and networking.
  4. Creating Runbooks and automating them using scripting tools like Shell scripting, Python etc.
  5. Working knowledge with any of the configuration management tools like Terraform, Ansible or SALT
  6. Data Security and Compliance: Implement and enforce security best practices to safeguard data integrity and confidentiality within the Cloudera environment. Ensure compliance with relevant regulations and standards (e.g., GDPR, HIPAA, DPR).
  7. Performance Optimization: Continuously optimize the Cloudera infrastructure to enhance performance, efficiency, and cost-effectiveness. Identify and resolve bottlenecks, tune configurations, and implement best practices for resource utilization.
  8. Capacity Planning: Planning and performance tuning of Hadoop clusters, Monitor resource utilization trends and plan for future capacity needs. Proactively identify potential capacity constraints and propose solutions to address them.
  9. Collaborate effectively with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability.
  10. Work closely with teams to optimize the overall performance of the PhonePe Hadoop ecosystem.
  11. Backup and Disaster Recovery: Implement robust backup and disaster recovery strategies to ensure data protection and business continuity. Test and maintain backup and recovery procedures regularly.
  12. Develop tools and services to enhance debuggability and supportability.
  13. Patches & Upgrades: Routinely apply recommended patches and perform rolling upgrades of the platform in accordance with the advisory from Cloudera, InfoSec and Compliance.
  14. Documentation and Knowledge Sharing: Create comprehensive documentation for configurations, processes, and procedures related to the Cloudera Data Platform. Share knowledge and best practices with team members to foster continuous learning and improvement.
  15. Collaboration and Communication: Collaborate effectively with cross-functional teams including data engineers, developers, and IT operations personnel. Communicate project status, issues, and resolutions clearly and promptly.


Skills Required:

  1. Bachelor's degree in Computer Science, Engineering, or related field.
  2. Proficiency in Linux system administration, shell scripting, and networking concepts  including IPtables, and IPsec.
  3. Strong understanding of networking, open-source technologies, and tools.
  4. 3-5 years of experience in the design, set up, and management of large-scale Hadoop clusters, ensuring high availability, fault tolerance, and performance optimization.
  5. Strong understanding of distributed computing principles and experience with Hadoop ecosystem technologies (HDFS, MapReduce, YARN, Hive, Spark, etc.).
  6. Experience with Kerberos and LDAP.
  7. Strong Knowledge of databases like Mysql,Nosql,Sql server
  8. Hands-on experience with configuration management tools (e.g., Salt,Ansible, Puppet, Chef).
  9. Strong scripting skills (e.g., PERL,Python, Bash) for automation and troubleshooting.
  10. Experience with monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
  11. Knowledge of networking principles and protocols (TCP/IP, UDP, DNS, DHCP, etc.).
  12. Experience with managing *nix based machines and strong working knowledge of quintessential Unix programs and tools (e.g. Ubuntu, Fedora, Redhat, etc.)
  13. Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
  14. Excellent analytical, problem-solving, and troubleshooting skills..
  15. Proven ability to work well under pressure and manage multiple priorities simultaneously.

Good To Have:

  1. Cloudera Certified Administrator (CCA) or Cloudera Certified Professional (CCP) certification preferred.
  2. Minimum 2 years of experience in managing and administering medium/large hadoop based environments (>100 machines), including Cloudera Data Platform (CDP) experience is highly desirable.
  3. Familiarity with Open Data Lake components such as Ozone, Iceberg, Spark, Flink, etc.
  4. Familiarity with containerization and orchestration technologies (e.g. Docker, Kubernetes, OpenShift) is a plus
  5. Design,develop and maintain Airflow DAGs and tasks to automate BAU processes,ensuring they are robust,scalable and efficient.

 

PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
  • Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
  • Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
  • Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
  • Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment 
  • Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy

Working at PhonePe is a rewarding experience! Great people, a work environment that thrives on creativity, the opportunity to take on roles beyond a defined job description are just some of the reasons you should work with us. Read more about PhonePe on our blog.

Life at PhonePe

PhonePe in the news

Similar Jobs

FalconX - FP&A Senior Associate

FalconX

San Mateo, California, United States (Hybrid)
2 Months ago
Expedia - Data Scientist III, Product Analytics

Expedia

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Trailmix - People & Finance Advisor

Trailmix

London, England, United Kingdom (Hybrid)
1 Week ago
Alpha Sense - Staff Engineer, iOS

Alpha Sense

Helsinki, Uusimaa, Finland (On-Site)
1 Week ago
Blinkhealth - Software Engineer

Blinkhealth

India (On-Site)
2 Months ago
Nagarro - Associate Staff Engineer, DevOps

Nagarro

(On-Site)
8 Months ago
Ettain Group - Automation Engineer

Ettain Group

Merrimack, New Hampshire, United States (On-Site)
10 Years ago
Palo Alto Networks - Sr Site Reliability Engineer (App Service Team)

Palo Alto Networks

Santa Clara, California, United States (On-Site)
3 Months ago
JDA - Enterprise Solution Architect - Consulting

JDA

Bengaluru, Karnataka, India (On-Site)
2 Days ago
Journee - Lead Engineer, Cloud Infrastructure

Journee

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Granicus - Project Manager, Digital Services

Granicus

Canada (Remote)
1 Month ago
Sonar Source - Research Associate

Sonar Source

Singapore (On-Site)
3 Months ago
IGT - Software Engineer (Dev) III

IGT

West Greenwich, Rhode Island, United States (On-Site)
2 Months ago
Amanotes - Product Monetization Lead

Amanotes

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)
6 Months ago
Mozilla - Senior Growth Product Manager

Mozilla

United States (Remote)
1 Week ago
Ion - Senior IT Architect, Italy

Ion

Italy (Hybrid)
9 Months ago
Apple - Inductive Engineering Program Manager

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Gigamon - Staff Technical Support Engineer

Gigamon

Santa Clara, California, United States (Hybrid)
3 Weeks ago
Reddit - Senior Software Engineer, Ads Experimentation Platform

Reddit

Ontario, Canada (Remote)
1 Month ago
Gigamon - Sr. Marketing Specialist

Gigamon

Singapore (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Paytm - Data Engineer - Technical Lead

Paytm

Noida, Uttar Pradesh, India (On-Site)
7 Months ago
Ajmera Infotech - React Developer II – Medical Compliance Test Suite

Ajmera Infotech

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Capgemini - Temenos Data Analytics

Capgemini

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Capgemini - Mendix Lead

Capgemini

Bengaluru, Karnataka, India (On-Site)
1 Month ago
HCL Tech - Senior Technical Lead

HCL Tech

Pune, Maharashtra, India (On-Site)
2 Months ago
Assystems - Resident Engineer cum Highway  Engineer

Assystems

Aizawl, Mizoram, India (On-Site)
8 Months ago
Green gold animation - Accountant-Inventory Management

Green gold animation

Hyderabad, Telangana, India (On-Site)
2 Days ago
Siemens  - IT Service Operator - ALM

Siemens

Pune, Maharashtra, India (On-Site)
2 Months ago
ISS Stoxx - ESG Ratings Analyst

ISS Stoxx

Mumbai, Maharashtra, India (On-Site)
1 Year ago
ShyftLabs - Apache Druid Engineer

ShyftLabs

Gurugram, Haryana, India (Hybrid)
9 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Salesforce - Distributed Systems Software Engineer - Public Cloud (Senior/Lead/Principal)

Salesforce

San Francisco, California, United States (On-Site)
9 Months ago
TechVedika - L3 Support/ Infra Cloud Engineer/VM Ware

TechVedika

Hyderabad, Telangana, India (On-Site)
1 Month ago
Palo Alto Networks - Senior Consulting Director, Cloud Security, Proactive Services (Unit 42)

Palo Alto Networks

Chicago, Illinois, United States (Remote)
2 Weeks ago
BigID - DevOps Engineer

BigID

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Days ago
Ziff Davis - Systems/DevOps Engineer

Ziff Davis

Canada (Remote)
2 Months ago
Scientific Games - Advanced Solutions Architect

Scientific Games

Georgia, United States (Remote)
3 Months ago
Buckman - BDM / Solutions Engineer - Tissue Iberia

Buckman

Catalonia, Spain (On-Site)
8 Months ago
Nice - Senior Automation Engineer, Actimize

Nice

Pune, Maharashtra, India (Hybrid)
2 Weeks ago
PhonePe - Site Reliability Engineer - Systems

PhonePe

Bengaluru, Karnataka, India (On-Site)
1 Week ago
Anthology  Inc  - Solutions Engineer - Enterprise

Anthology Inc

United States (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike. With 48 crore (480 Million) registered users, one in four Indians are now on PhonePe. The company has also successfully digitized 3.6 crore (36 Million) offline merchants spread across Tier 2,3,4 and beyond, covering 99% of the postal codes across India. PhonePe is also the leader in Bharat Bill Pay System (BBPS), processing over 45% of the transactions on the BBPS platform. PhonePe forayed into financial services in 2017, providing users with safe and convenient investing options on its platform. Since then, the company has introduced several Mutual Funds and Insurance products that offer every Indian an equal opportunity to unlock the flow of money and access to services. PhonePe was recently recognized as the Most Trusted Brand for Digital Payments as per the Brand Trust Report 2023 by Trust Research Advisory (TRA).



Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

View All Jobs

Get notified when new jobs are added by PhonePe

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug