Data Engineer - Python & Databricks

9 Months ago • 5 Years +

Job Description

As a Data Engineer Developer, you will design, develop, and maintain data pipelines using Python and Databricks to process large-scale datasets. You'll collaborate with data scientists, analysts, and stakeholders to gather requirements and build efficient, scalable solutions for advanced analytics and reporting. Responsibilities include data pipeline development (batch and real-time), ETL process creation and maintenance, data integration from diverse sources, collaboration with cross-functional teams, performance optimization, data validation, cloud platform integration (AWS, Azure, or Google Cloud), automation and scheduling, and comprehensive documentation. The role requires expertise in Python, Databricks, and various data technologies.

Good To Have:

Delta Lake, Lakehouse architecture
Machine learning and data science workflows
DevOps/DataOps practices
Terraform, Docker, Kubernetes
Data governance, data privacy (GDPR, CCPA), data security

Must Have:

5+ years Data Engineering experience with Python expertise
Databricks or similar big data platform experience
Strong understanding of data pipelines, ETL, data integration
Cloud platform experience (AWS, Azure, GCP)
SQL proficiency and relational/non-relational database experience
Big data technologies (Spark, Kafka, Hadoop)
Data modeling, warehousing, database design
Experience with large datasets, ensuring data integrity and performance
Git and CI/CD pipeline experience

Add these skills to join the top 1% applicants for this job

ci-cd

github

kubernetes

spark

azure

aws

python

docker

sql

git

terraform

data-science

hadoop

Project description

As a Data Engineer Developer, you will design, develop, and maintain data pipelines using Python and Databricks to process large-scale data sets. You will collaborate with data scientists, analysts, and business stakeholders to gather data requirements and build efficient, scalable solutions that enable advanced analytics and reporting.

Responsibilities

Data Pipeline Development: Design, develop, and implement scalable data pipelines using Python and Databricks for batch and real-time data processing.

ETL Processes: Build and maintain ETL (Extract, Transform, Load) processes to gather, transform, and store data from multiple sources.

Data Integration: Integrate structured and unstructured data from various internal and external sources into data lakes or warehouses, ensuring data accuracy and quality.

Collaboration: Work closely with data scientists, analysts, and business teams to understand data needs and deliver efficient solutions.

Performance Optimization: Optimize the performance of data pipelines and workflows to ensure efficient processing of large data sets.

Data Validation: Implement data validation and monitoring mechanisms to ensure data quality, consistency, and reliability.

Cloud Integration: Work with cloud platforms like AWS, Azure, or Google Cloud to build and maintain data storage and processing infrastructure.

Automation & Scheduling: Automate data pipelines and implement scheduling mechanisms to ensure timely and reliable data delivery.

Documentation: Maintain comprehensive documentation for data pipelines, processes, and best practices.

Skills

Must have

5+ years of experience as a Data Engineer with strong expertise in Python.

Bachelor's degree in Computer Science, Data Engineering, or a related field (or equivalent experience).

Hands-on experience with Databricks or similar big data platforms.

Strong understanding of data pipelines, ETL processes, and data integration techniques.

Experience with cloud-based platforms such as AWS, Azure, or Google Cloud, particularly with services like Data Lakes, S3, or Azure Blob Storage.

Proficiency in SQL and experience with relational and non-relational databases.

Familiarity with big data technologies like Apache Spark, Kafka, or Hadoop.

Strong understanding of data modeling, data warehousing, and database design principles.

Ability to work with large, complex datasets, ensuring data integrity and performance optimization.

Experience with version control tools like Git and CI/CD pipelines for data engineering.

Excellent problem-solving skills, attention to detail, and the ability to work in a collaborative environment.

Nice to have

Experience with Delta Lake, Lakehouse architecture, or other modern data storage solutions.

Familiarity with machine learning and data science workflows.

Experience with DevOps or DataOps practices.

Knowledge of Terraform, Docker, or Kubernetes for cloud infrastructure automation.

Familiarity with data governance, data privacy regulations (e.g., GDPR, CCPA), and data security best practices.

Other

Languages

English: B2 Upper Intermediate

Seniority

Regular

Set alerts for new jobs by Luxoft

Set alerts for new jobs in India

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Data Engineer - Python & Databricks

Job Summary

Job Description

13 skills required for this role

Job Details

Project description

Responsibilities

Skills

Other

Job Alerts

Level Up Your Career in Game Development!