##### Project description
You will join the team behind an internal AI platform for processing and interacting with unstructured data. The team is currently over 30 people strong and is organized into agile teams, each of which is self-sufficient and handles the creation of features from the idea stage, through analysis, implementation, testing, production deployment, and maintenance. The team is international, and it's located in Krakow, Wroclaw, London and New York.
##### Responsibilities
- Design, build, and maintain scalable data pipelines using Python and Azure Data Factory
- Work with Azure SQL and PostgreSQL to ingest, transform, and store structured and unstructured data
- Develop and optimize ETL/ELT processes for high-volume data workflows
- Use Databricks to process large datasets and build data models for downstream AI/ML components
- Collaborate with data scientists, backend engineers, and product teams to understand data requirements
- Ensure data quality, integrity, and security across all stages of the data lifecycle
- Manage infrastructure as code using Terraform for provisioning and maintaining cloud resources
- Contribute to CI/CD practices using Azure DevOps for data pipeline deployments and versioning
- Support analytics and reporting teams by enabling data access via Power BI or similar tools
##### Skills
Must have
- Strong programming skills in Python for data processing and scripting
- Experience with Azure Data Factory (ADF) for building and orchestrating data pipelines
- Proficiency in working with Azure SQL and PostgreSQL databases
- Hands-on experience with Databricks for big data processing and transformation
- Solid understanding of data engineering concepts: ETL/ELT, data modeling, data quality
- Familiarity with infrastructure as code using Terraform
- Experience with Azure DevOps for CI/CD pipelines and version control
- Ability to work with unstructured data and integrate it into structured models
- Experience in agile development environments and cross-functional teams
- Good communication skills and ability to work in an international, distributed team
Nice to have
- Experience with Power BI or other BI tools for data visualization and reporting
- Knowledge of Spark and distributed data processing concepts
- Familiarity with Delta Lake or similar data lakehouse architectures
- Understanding of data governance, lineage, and cataloging tools (e.g. Azure Purview)
- Basic knowledge of machine learning workflows or support for data science teams
- Experience working with APIs for data ingestion or integration
- Familiarity with containerization tools like Docker or Kubernetes
- Exposure to monitoring and alerting tools for data pipeline health (e.g. Azure Monitor, Grafana)
- Knowledge of data security best practices and compliance (e.g. GDPR, data encryption)
- Prior experience working on AI-related or unstructured data projects
##### Other
Languages
English: C1 Advanced
Seniority
Senior