As a Data Engineer Developer, you will design, develop, and maintain data pipelines using Python and Databricks to process large-scale data sets. You will collaborate with data scientists, analysts, and business stakeholders to gather data requirements and build efficient, scalable solutions that enable advanced analytics and reporting.
Data Pipeline Development: Design, develop, and implement scalable data pipelines using Python and Databricks for batch and real-time data processing.
ETL Processes: Build and maintain ETL (Extract, Transform, Load) processes to gather, transform, and store data from multiple sources.
Data Integration: Integrate structured and unstructured data from various internal and external sources into data lakes or warehouses, ensuring data accuracy and quality.
Collaboration: Work closely with data scientists, analysts, and business teams to understand data needs and deliver efficient solutions.
Performance Optimization: Optimize the performance of data pipelines and workflows to ensure efficient processing of large data sets.
Data Validation: Implement data validation and monitoring mechanisms to ensure data quality, consistency, and reliability.
Cloud Integration: Work with cloud platforms like AWS, Azure, or Google Cloud to build and maintain data storage and processing infrastructure.
Automation & Scheduling: Automate data pipelines and implement scheduling mechanisms to ensure timely and reliable data delivery.
Documentation: Maintain comprehensive documentation for data pipelines, processes, and best practices.
Must have
5+ years of experience as a Data Engineer with strong expertise in Python.
Bachelor's degree in Computer Science, Data Engineering, or a related field (or equivalent experience).
Hands-on experience with Databricks or similar big data platforms.
Strong understanding of data pipelines, ETL processes, and data integration techniques.
Experience with cloud-based platforms such as AWS, Azure, or Google Cloud, particularly with services like Data Lakes, S3, or Azure Blob Storage.
Proficiency in SQL and experience with relational and non-relational databases.
Familiarity with big data technologies like Apache Spark, Kafka, or Hadoop.
Strong understanding of data modeling, data warehousing, and database design principles.
Ability to work with large, complex datasets, ensuring data integrity and performance optimization.
Experience with version control tools like Git and CI/CD pipelines for data engineering.
Excellent problem-solving skills, attention to detail, and the ability to work in a collaborative environment.
Nice to have
Experience with Delta Lake, Lakehouse architecture, or other modern data storage solutions.
Familiarity with machine learning and data science workflows.
Experience with DevOps or DataOps practices.
Knowledge of Terraform, Docker, or Kubernetes for cloud infrastructure automation.
Familiarity with data governance, data privacy regulations (e.g., GDPR, CCPA), and data security best practices.
English: B2 Upper Intermediate
Regular
Luxoft, a DXC Technology Company (NYSE: DXC), is a digital strategy and software engineering firm providing bespoke technology solutions that drive business change for customers the world over. Acquired by U.S. company DXC Technology in 2019, Luxoft is a global operation in 44 cities and 21 countries with an international, agile workforce of nearly 18,000 people. It combines a unique blend of engineering excellence and deep industry expertise, helping over 425 global clients innovate in the areas of automotive, financial services, travel and hospitality, healthcare, life sciences, media and telecommunications.
DXC Technology is a leading Fortune 500 IT services company which helps global companies run their mission critical systems. Together, DXC and Luxoft offer a differentiated customer-value proposition for digital transformation by combining Luxoft’s front-end digital capabilities with DXC’s expertise in IT modernization and integration. Follow our profile for regular updates and insights into technology and business needs.