Pyspark Engineer

9 Months ago • All levels

Job Description

This role involves designing, developing, and maintaining ETL pipelines using PySpark, optimizing for performance and scalability. The PySpark Engineer will work with large structured and unstructured datasets, transforming data to meet business needs and integrating data from multiple sources. Collaboration with cross-functional teams is key to understanding data requirements and translating them into efficient workflows. Responsibilities include implementing data governance best practices, debugging pipelines, improving performance and reliability, and providing documentation and training. The project focuses on a high-impact data engineering initiative, delivering data-driven insights for business decisions.

Good To Have:

Airflow or other orchestration tools
Apache Kafka knowledge
Data visualization tools (Tableau, Power BI)
Machine learning familiarity
Agile methodology experience
Data governance and compliance knowledge

Must Have:

Proficiency in PySpark
Strong SQL knowledge
Data Warehousing Concepts
Cloud Platform experience (AWS, GCP, Azure)
Big Data Technologies (Hadoop, Spark)
Data Modeling experience
Strong Python skills

Add these skills to join the top 1% applicants for this job

spark

azure

data-visualization

aws

python

sql

tableau

hadoop

power-bi

apache-kafka

scalability

cross-functional

Project description

We are looking for skilled PySpark Engineers to join our team, working on a high-impact data engineering project. The project involves processing large datasets, optimizing ETL pipelines, and building scalable solutions to manage complex data workflows. The ideal candidate will collaborate closely with data scientists, data analysts, and software engineers to drive robust, data-driven insights for business decisions.

Responsibilities

Design, develop, and maintain ETL pipelines using PySpark, optimizing for performance and scalability.

Work with large volumes of structured and unstructured data, transforming data to meet business needs.

Integrate data from multiple sources into the data platform, ensuring data integrity and quality.

Collaborate with cross-functional teams to understand data requirements and translate them into efficient data workflows.

Implement best practices for data governance, monitoring, and data security.

Debug and troubleshoot issues across ETL pipelines and data workflows.

Continuously improve performance, scalability, and reliability of existing data pipelines.

Provide documentation and training for data workflows and processes.

Skills

Must have

Proficiency in PySpark: In-depth experience with PySpark for data processing and transformation tasks.

SQL Knowledge: Strong command of SQL for querying and processing data.

Data Warehousing Concepts: Familiarity with data warehousing, data lakes, and data integration principles.

Cloud Platforms: Experience with cloud environments like AWS, GCP, or Azure for data storage and processing.

Big Data Technologies: Hands-on experience with Hadoop and Spark ecosystem (Spark SQL, Spark Streaming).

Data Modeling: Experience in designing and implementing efficient data models.

Python Programming: Strong Python skills, particularly in data manipulation and analysis.

Nice to have

Experience with Airflow or Other Orchestration Tools: Knowledge of workflow orchestration tools for scheduling and monitoring data pipelines.

Knowledge of Apache Kafka: Understanding of Kafka for real-time data streaming and integration.

Familiarity with Data Visualization Tools: Knowledge of visualization tools like Tableau, Power BI, or similar.

Machine Learning Exposure: Familiarity with machine learning concepts, particularly with integrating ML models in data workflows.

Agile Methodology: Experience working in Agile/Scrum environments.

Data Governance and Compliance Knowledge: Understanding of data governance frameworks and compliance standards, such as GDPR.

Other

Languages

English: C1 Advanced

Seniority

Senior

Set alerts for new jobs by Luxoft

Set alerts for new jobs in Australia

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Pyspark Engineer

Job Summary

Job Description

12 skills required for this role

Job Details

Project description

Responsibilities

Skills

Other

Job Alerts

Level Up Your Career in Game Development!