Data Engineer (SA25)

BBD

Job Summary

BBD is seeking a skilled Data Engineer to design, build, and maintain scalable data pipelines and architectures. This role is crucial for enabling data-driven decision-making by ensuring a robust, secure, and efficient data infrastructure. The engineer will utilize modern tools and cloud platforms like AWS, Azure, and Databricks to transform raw data into actionable insights, supporting both traditional analytics and emerging AI/ML workloads. Responsibilities include pipeline development, architecture and modeling, cloud infrastructure management, data governance, and collaboration with data scientists and AI engineers.

Must Have

  • Design, build and maintain efficient, reliable and scalable ETL/ELT pipelines using Python, SQL, and Spark.
  • Implement modern data architectures (e.g., Data Lakehouse, Medallion Architecture) and data models.
  • Manage and optimise cloud-based data infrastructure on AWS and Azure.
  • Implement data governance, security and quality standards (e.g., using Great Expectations, Unity Catalog).
  • Work closely with Data Scientists, AI Engineers and Business Analysts to understand data requirements.
  • Collaborate on MLOps practices, supporting model deployment and monitoring.
  • Monitor pipeline performance, troubleshoot issues, and drive automation using CI/CD practices.
  • Minimum of 5 years of professional experience.
  • At least 2 years of experience with Databricks.
  • Strong proficiency in Python for data manipulation and scripting.
  • Extensive experience with Apache Spark (PySpark) for batch and streaming data processing.
  • Proficiency with Apache Airflow or similar tools for scheduling and managing complex workflows.
  • Proficiency in modern cloud data warehouses such as Snowflake.
  • Expert SQL skills for analysis and transformation.
  • Deep understanding of Big Data file formats (Parquet, Avro, Delta Lake).
  • Experience designing Data Lakes and implementing patterns like the Medallion Architecture.
  • Experience with real-time data processing using Kafka or similar streaming platforms.
  • Proficiency with Git for version control.
  • Experience implementing CI/CD pipelines for data infrastructure.
  • Familiarity with data quality frameworks like Great Expectations or Soda.
  • Understanding of data governance principles, security, and lineage.
  • Experience serving data to BI tools like Power BI, Tableau, or Looker.
  • Deep knowledge of Amazon S3 for data lake storage.
  • Hands-on experience with AWS Glue for serverless data integration.
  • Experience with AWS Lake Formation for centrally managing security and access controls.
  • Proficiency with Amazon Kinesis for collecting and processing real-time data.
  • Solid understanding of core AWS services (IAM, Lambda, EC2, CloudWatch).
  • Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage.
  • Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines.
  • Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID.
  • Proficiency with Azure Event Hubs or Azure Stream Analytics for real-time data ingestion.
  • Understanding of core Azure services (Resource Groups, VNets, Azure Monitor).
  • Experience managing Databricks Workspaces, clusters, and compute resources.
  • Proficiency with Unity Catalog for centralised access control, auditing, and data lineage.
  • Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines.
  • Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation).
  • AWS Certified Data Engineer – Associate (DEA-C01).
  • AWS Certified Solutions Architect – Associate.
  • Microsoft Certified: Azure Data Engineer Associate (DP-203).
  • Microsoft Certified: Azure Solutions Architect Expert.
  • Databricks Certified Data Engineer Professional.
  • Databricks Certified Data Engineer Associate.

Good to Have

  • Experience with Scala or Java.
  • Exposure to Generative AI concepts (LLMs, RAG, Vector Search) and how data engineering supports them.
  • Experience with MLflow for experiment tracking and model registry.
  • Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks.

Perks & Benefits

  • Flexible, hybrid working environment
  • Snacks, great coffee and catered lunches
  • Social, sport and cultural gatherings
  • Awards Nominations and shoutouts
  • Exceptional bonuses for exceptional performance
  • Support, encouragement and guidance for career growth
  • Space and opportunity to continue learning, growing and expanding skillsets

Job Description

Job Description

The Company

BBD is an international custom software solutions company that solves real-world problems with innovative solutions and modern technology stacks. With extensive experience across various sectors and a wide array of technologies, BBD’s core services encompass digital enablement, software engineering and solutions support, which includes cloud engineering, data science, product design and managed services.

Over the past 40 years, we have built a reputation for hiring the best talent and collaborating with client teams to deliver exceptional value through software. As the company has grown, this unwavering commitment to quality and continuous innovation has ensured clients get the full benefit from software that meets their unique environment.

The culture

BBD’s culture is one that encourages collaboration, innovation and inclusion. Our relaxed yet professional work environment extends into a flat management structure. At BBD, you are not just a number, but a valuable member of the team, working with like-minded, passionate individuals on challenging projects in interesting spaces. We deeply believe in the importance of each individual taking control of their career growth, with the support, encouragement and guidance of the company. We do this for every BBDer, creating the space and opportunity to continue learning, growing and expanding their skillsets. We also proudly support and ensure diverse project teams as varied perspectives will always make for stronger solutions.

With hubs in 7 cities, we have mastered distributed development and support a flexible, hybrid working environment. Our hubs are also a great place to get to know people, share knowledge, and enjoy snacks, great coffee and catered lunches as well as social, sport and cultural gatherings.

Lastly, recognition is deeply ingrained in the BBD culture and we use every appropriate opportunity to show this through our Awards Nominations, shoutouts and of the course the exceptional bonuses that come from exceptional performance.

The role

BBD is looking for a skilled Data Engineer to design, build and maintain scalable data pipelines and architectures. You will play a pivotal role in enabling data-driven decision-making by ensuring our data infrastructure is robust, secure and efficient. You will work with modern tools and cloud platforms (AWS, Azure, Databricks) to transform raw data into actionable insights, supporting both traditional analytics and emerging AI/ML workloads.

Responsibilities

  • Pipeline development: Design, build and maintain efficient, reliable and scalable ETL/ELT pipelines using Python, SQL, and Spark
  • Architecture & modelling: Implement modern data architectures (e.g., Data Lakehouse, Medallion Architecture) and data models to support business reporting and advanced analytics
  • Cloud infrastructure: Manage and optimise cloud-based data infrastructure on AWS and Azure, ensuring cost-effectiveness and performance
  • Data governance: Implement data governance, security and quality standards (e.g., using Great Expectations, Unity Catalog) to ensure data integrity and compliance
  • Collaboration: Work closely with Data Scientists, AI Engineers and Business Analysts to understand data requirements and deliver high-quality datasets
  • MLOps support: Collaborate on MLOps practices, supporting model deployment and monitoring through robust data foundations
  • Continuous improvement: Monitor pipeline performance, troubleshoot issues, and drive automation using CI/CD practices

Requirements

  • A minimum of 5 years of professional experience, with at least 2 years of experience with Databricks

Skills and Experience

Core data engineering skills:

  • Programming & scripting: Strong proficiency in Python for data manipulation and scripting. Experience with Scala or Java is a plus
  • Big Data processing: Extensive experience with Apache Spark (PySpark) for batch and streaming data processing
  • Workflow orchestration: Proficiency with Apache Airflow or similar tools (e.g., Prefect, Dagster, Azure Data Factory) for scheduling and managing complex workflows
  • Data warehousing: Proficiency in modern cloud data warehouses such as Snowflake, including designing, modelling and optimising analytical data structures to support reporting, BI and downstream analytics
  • Data modelling & storage:
  • Expert SQL skills for analysis and transformation
  • Deep understanding of Big Data file formats (Parquet, Avro, Delta Lake)
  • Experience designing Data Lakes and implementing patterns like the Medallion Architecture (Bronze/Silver/Gold layers)
  • Streaming: Experience with real-time data processing using Kafka or similar streaming platforms
  • DevOps & CI/CD:
  • Proficiency with Git for version control
  • Experience implementing CI/CD pipelines for data infrastructure (e.g., GitHub Actions, GitLab CI, Azure DevOps)
  • Data quality & governance:
  • Familiarity with data quality frameworks like Great Expectations or Soda
  • Understanding of data governance principles, security, and lineage
  • Reporting & visualisation: Experience serving data to BI tools like Power BI, Tableau, or Looker
  • AI/ML familiarity: Exposure to Generative AI concepts (LLMs, RAG, Vector Search) and how data engineering supports them

AWS data engineering skills:

  • Storage: Deep knowledge of Amazon S3 for data lake storage, including lifecycle policies and security configurations
  • ETL & orchestration: Hands-on experience with AWS Glue (Crawlers, Jobs, Workflows, Data Catalog) for serverless data integration
  • Governance: Experience with AWS Lake Formation for centrally managing security and access controls
  • Streaming: Proficiency with Amazon Kinesis (Data Streams, Firehose) for collecting and processing real-time data
  • Core services: Solid understanding of core AWS services (IAM, Lambda, EC2, CloudWatch) relevant to data engineering

Additional Information

Azure data engineering skills:

  • Storage: Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage
  • ETL & orchestration: Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines for data integration and orchestration
  • Governance: Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID (formerly Azure AD) for access management
  • Streaming: Proficiency with Azure Event Hubs or Azure Stream Analytics for real-time data ingestion
  • Core Services: Understanding of core Azure services (Resource Groups, VNets, Azure Monitor) relevant to data solutions

Databricks skills:

  • Platform management: Experience managing Databricks Workspaces, clusters, and compute resources
  • Governance: Proficiency with Unity Catalog for centralised access control, auditing, and data lineage
  • Development:
  • Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines
  • Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation)
  • AI & ML integration:
  • Experience with MLflow for experiment tracking and model registry
  • Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks

Required certifications

  • AWS:
  • AWS Certified Data Engineer – Associate (DEA-C01)
  • AWS Certified Solutions Architect – Associate
  • Azure:
  • Microsoft Certified: Azure Data Engineer Associate (DP-203)
  • Microsoft Certified: Azure Solutions Architect Expert
  • Databricks:
  • Databricks Certified Data Engineer Professional
  • Databricks Certified Data Engineer Associate

Internal candidate profile

We are open to training internal candidates who demonstrate strong engineering fundamentals and a passion for data. Ideal internal candidates might currently be in the following roles:

  • Python Back-end Engineer: Strong coding skills (Python) and experience with APIs / back-end systems, looking to specialise in big data processing and distributed systems
  • DevOps Engineer: Coding background with strong infrastructure-as-code and CI/CD skills, interested in applying those practices specifically to data pipelines and MLOps

BBD is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, family, gender identity or expression, genetic information, marital status, political affiliation, race, religion or any other characteristic protected by applicable laws, regulations or ordinances.

26 Skills Required For This Role

Team Management Data Analytics Internal Audit Github Data Structures Unity Game Texts Gitlab Aws Azure Model Serving Azure Devops Power Bi Looker Tableau Spark Model Deployment Data Science Ci Cd Back End Git Python Scala Sql Github Actions Java

Similar Jobs