Data Engineer – AWS + Hadoop

Synechron

7+ Years | Bangalore, Karnataka, India (On Site) | Full Time | 1 day ago

Apply Now

Job Summary

Synechron is seeking a seasoned Data Engineer with 7+ years of experience in AWS data services and the Hadoop ecosystem. This role involves designing, building, and optimizing scalable ETL/ELT batch and streaming data pipelines using technologies like Kafka/Kinesis and Spark. Key responsibilities include developing data lakes and warehouses on AWS, managing Hadoop components, ensuring data quality, and implementing governance and security. The engineer will also set up orchestrations and CI/CD for data jobs, monitor pipelines, and collaborate with analytics teams to provide high-quality datasets and APIs.

Must Have

Design and implement scalable ETL/ELT pipelines for batch and streaming workloads.
Build data ingestion frameworks using Kafka/Kinesis and process data with Spark.
Develop and optimize data lakes and data warehouses on AWS (S3, Glue, EMR, Athena, Redshift).
Manage and tune Hadoop ecosystem components (HDFS, Hive, Spark, Oozie/Airflow, Sqoop).
Model data, manage schemas, partitioning, metadata; ensure data quality.
Implement data governance, security, and access controls (IAM, Lake Formation, encryption).
Set up orchestrations and CI/CD for data jobs (Airflow/AWS Step Functions, Jenkins/GitHub Actions).
Monitor pipelines and optimize cost, performance, and reliability (CloudWatch, logs, metrics).
Collaborate with Analytics/ML/BI teams; provide high-quality curated datasets and APIs/Views.
Document solutions, conduct code reviews, and enforce engineering best practices.
7+ years in Data Engineering with large-scale distributed data systems.
Strong experience with AWS data stack: S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch.
Hands-on with Hadoop ecosystem: HDFS, Hive, Spark (PySpark/Scala), Kafka, Oozie/Airflow.
Expertise in SQL (complex queries, performance tuning) and data modeling.
Practical knowledge of streaming (Kafka/Kinesis, Spark Streaming/Structured Streaming).
Experience with Python or Scala for data pipelines; Shell scripting.
Familiarity with Orchestration (Airflow/AWS Step Functions) and CI/CD for data jobs.
Strong understanding of security & governance (encryption, PII handling, RBAC, Lake Formation).
Proficient with version control (Git) and containers (Docker) for reproducible jobs.
Excellent problem-solving, communication, and collaboration skills.

Job Description

About the Role

We’re looking for a seasoned Data Engineer with hands-on expertise in AWS data services and the Hadoop ecosystem. You will design, build, and optimize batch/streaming data pipelines, enable reliable data ingestion/processing, and support analytics, ML, and BI use cases at scale.

Key Responsibilities

Design and implement scalable ETL/ELT pipelines for batch and streaming workloads.
Build data ingestion frameworks using Kafka/Kinesis, and process data with Spark (PySpark/Scala).
Develop and optimize data lakes and data warehouses on AWS (S3, Glue, EMR, Athena, Redshift).
Manage and tune Hadoop ecosystem components (HDFS, Hive, Spark, Oozie/Airflow, Sqoop).
Model data (star/snowflake), manage schemas, partitioning, and metadata; ensure data quality (DQ checks).
Implement data governance, security, and access controls (IAM, Lake Formation, encryption, key management).
Set up orchestrations and CI/CD for data jobs (Airflow/AWS Step Functions, Jenkins/GitHub Actions).
Monitor pipelines and optimize cost, performance, and reliability (CloudWatch, logs, metrics).
Collaborate with Analytics/ML/BI teams; provide high-quality curated datasets and APIs/Views.
Document solutions, conduct code reviews, and enforce engineering best practices.

Required Skills & Qualifications

7+ years in Data Engineering with large-scale distributed data systems.
Strong experience with AWS data stack: S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch.
Hands-on with Hadoop ecosystem: HDFS, Hive, Spark (PySpark/Scala), Kafka, Oozie/Airflow.
Expertise in SQL (complex queries, performance tuning) and data modeling.
Practical knowledge of streaming (Kafka/Kinesis, Spark Streaming/Structured Streaming).
Experience with Python or Scala for data pipelines; Shell scripting.
Familiarity with Orchestration (Airflow/AWS Step Functions) and CI/CD for data jobs.
Strong understanding of security & governance (encryption, PII handling, RBAC, Lake Formation).
Proficient with version control (Git) and containers (Docker) for reproducible jobs.
Excellent problem-solving, communication, and collaboration skills.

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

About Us

At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more.

Over the last 20+ years, our company has been honored with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 14,500+, and has 58 offices in 21 countries within key global markets.

For more information on the company, please visit our website

or LinkedIn

community.

Sustainability and Health Safety Commitment

At Synechron, we are committed to integrating sustainability into our business strategy, ensuring responsible growth while minimizing environmental impact. Employees play a key role in driving our sustainability initiatives, from reducing our carbon footprint to fostering ethical and sustainable business practices across global operations. All positions are required to adhere to our Sustainability and Health Safety standards, demonstrating a commitment to environmental stewardship, workplace safety, and sustainable practices.

15 Skills Required For This Role

Github Game Texts Aws Hadoop Spark Data Science Ci Cd Docker Git Python Shell Scala Sql Github Actions Jenkins