As a Data Engineer, you will be part of the Engineering team, supporting the development and maintenance of data pipelines for scientific processing and quality assurance. You will participate in designing, optimizing, and maintaining ETL/ELT pipelines using Airflow, working within established frameworks to ensure reliability, scalability, and compliance with data governance standards.
Your primary responsibilities will include organizing and structuring data systems, ensuring accurate reporting of pipeline performance, and contributing to scientific and healthcare data processing workflows. The role requires attention to detail, the ability to manage multiple priorities, and strong collaboration skills to work effectively with engineers, data scientists, and researchers.
You will focus on streamlining production workflows, ensuring proper monitoring and operational efficiency, and implementing best practices for data governance and security.
- Operate and optimize ETL/ELT pipelines using Airflow.
- Support the structuring and organization of data systems in alignment with predefined architectures.
- Ensure timely and accurate reporting of data pipeline performance and operational issues.
- Follow data governance, security, and compliance standards in all data processing activities.
- Work on containerized data infrastructures using Docker and Kubernetes under supervision.
- Contribute to operational tasks related to scientific data processing and quality control.
- Implement optimizations in Python and SQL-based workflows following team guidelines.
- Work within established frameworks for data lake and data warehouse maintenance.
- Collaborate with engineers and researchers to define data processing requirements.
- Contribute to the standardization and monitoring of production data workflows.
In particular, you will:
- Support the design and optimization of data pipelines using Airflow.
- Develop and operate Python and SQL-based solutions for data processing.
- Contribute to the development of scalable ETL/ELT pipelines to process and transform datasets.
- Work closely with data scientists, business developers, software engineers, and biomedical researchers to deliver high-quality data solutions.
- Contribute to management and monitoring of containerized data infrastructures with Docker, Kubernetes, and cloud platforms.
- Follow best practices for data governance, security, and compliance in all workflows.
- Operate on the data architectures, including data lakes, data warehouses, and analytical insights platforms.
- Contribute to the productionization of data processing pipelines, ensuring efficiency and scalability in scientific data workflows.
Position is based in our Paris office or remotely in France.