Lead Data Engineer - Databricks

1 Month ago • All levels

Data Analysis

Job Description

As a Lead Data Engineer specializing in Databricks, you will design, build, and optimize data pipeline solutions on Databricks and related cloud platforms. Working closely with data scientists, analysts, and engineers, you will ensure data infrastructure supports advanced analytics and business insights across industries like energy, resources, and mining. This role involves developing robust ETL/ELT pipelines, implementing data validation and governance, improving scalability, monitoring performance, and contributing to data architecture in a collaborative, agile team.

Good To Have:

Experience monitoring data pipeline performance and using observability tools to ensure data reliability.
Experience with designing and building event-driven architectures and streaming data tools (such as Apache Kafka or Spark Streaming).
Holding a current Databricks certification (e.g. Databricks Certified Data Engineer).
Background in or understanding of data from the energy, resources, or mining industry.

Must Have:

Design, build, and optimise data pipeline solutions on Databricks and related cloud platforms.
Work closely with data scientists, analysts, and other stakeholders to understand data requirements.
Contribute to Data Architecture and Solution Design, helping to build Proof of Concepts.
Design, develop, and maintain robust ETL/ELT pipelines on Databricks along with AWS / Azure / GCP tools and services.
Implement data validation, cleansing, and governance procedures to guarantee data quality, integrity, and security.
Continuously improve the scalability, efficiency, and cost-effectiveness of data pipelines.
Monitor data pipeline performance and promptly troubleshoot any issues or failures.
Maintain clear documentation of data pipelines, data models, and processes.

Perks:

Competitive salary package, share plan, company performance bonuses, value-based recognition awards, referral bonus
Career coaching, global career opportunities, non-linear career paths, internal development programmes for management and technical leadership
Complex projects, rotations, internal tech communities, training, certifications, coaching, online learning platforms subscriptions, pass-it-on sessions, workshops, conferences
Hybrid work and flexible working hours, employee assistance programme
Global internal wellbeing programme, access to wellbeing apps
Global internal tech communities, hobby clubs and interest groups, inclusion and diversity programmes, events and celebrations
Monthly Lifestyle Allowance: Contribution towards health and wellbeing activities like gym memberships.
Novated Leasing: Pre-tax car leasing benefit for new and used cars.
Loyalty Leave: Receive an additional day of leave on your 3rd, 4th, and 5th work anniversaries, accumulating up to a maximum of 3 extra days of leave per year.
Inclusive Parental Leave Policy: 12 weeks of primary carer’s leave and 4 weeks of secondary carer’s leave.
Work From Anywhere: 20 days of working from anywhere per year.

Add these skills to join the top 1% applicants for this job

game-texts

agile-development

postgresql

aws

nosql

azure

apache-kafka

spark

pandas

mongodb

ci-cd

cassandra

python

sql

As a Lead Data Engineer specialising in Databricks, you will design, build, and optimise data pipeline solutions on Databricks and related cloud platforms. Working closely with data scientists, analysts, and engineers, you will ensure our data infrastructure supports advanced analytics and business insights across industries (including energy, resources, and mining). You will join a collaborative, agile team where continuous improvement, innovation, and knowledge sharing are part of the culture.

Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver effective pipeline solutions.
Contribute to Data Architecture and Solution Design, helping to build Proof of Concepts.
Design, develop, and maintain robust ETL/ELT pipelines on using Databricks along with AWS / Azure / GCP tools and services, to ingest, process, and transform large datasets.
Implement data validation, cleansing, and governance procedures to guarantee data quality, integrity, and security. This includes enforcing data standards and addressing data quality issues proactively.
Continuously improve the scalability, efficiency, and cost-effectiveness of data pipelines. Identify opportunities to enhance performance, reliability, and cost-efficiency across our data systems.
Monitor data pipeline performance and promptly troubleshoot any issues or failures to ensure high data availability and consistency. Leverage observability tools and best practices to maintain reliable pipelines.
Develop streaming or event-driven data processes as needed for real-time analytics, leveraging frameworks like Apache Kafka and Spark Structured Streaming.
Maintain clear documentation of data pipelines, data models, and processes for transparency and team knowledge sharing. Follow best practices in coding, testing, and version control to ensure maintainable and auditable workflows.

Qualifications

Proficiency in Python for data engineering (including PySpark and libraries like pandas/Polars) and in SQL for data querying and transformation.
Solid understanding of data warehousing concepts and dimensional data modeling (e.g. star schema, Kimball methodology).
Hands-on experience with relational database systems and SQL (e.g. SQL Server, PostgreSQL) and familiarity with NoSQL databases (e.g. MongoDB, Cassandra) for varied data storage needs.
Strong experience designing and implementing ETL/ELT processes and integrating data from multiple sources.
Proven experience working with multiple cloud data platforms such as AWS / Azure / GCP.
Expertise in Databricks and the Spark ecosystem for large-scale data processing is required.
Familiarity with data pipeline orchestration and automation tools and with CI/CD pipelines for deploying data workflows.
Experience monitoring data pipeline performance and using observability tools to ensure data reliability is a plus.
Experience with designing and building event-driven architectures and streaming data tools (such as Apache Kafka or Spark Streaming) is beneficial for handling real-time data flows.
Experience working in Agile teams with iterative development, and a collaborative approach to problem-solving.
Holding a current Databricks certification (e.g. Databricks Certified Data Engineer) is a strong advantage.
Background in or understanding of data from the energy, resources, or mining industry is a plus, as it will help in delivering business-focused insights in these sectors.

Additional Information

Discover some of the global benefits that empower our people to become the best version of themselves:

Finance: Competitive salary package, share plan, company performance bonuses, value-based recognition awards, referral bonus;
Career Development: Career coaching, global career opportunities, non-linear career paths, internal development programmes for management and technical leadership;
Learning Opportunities: Complex projects, rotations, internal tech communities, training, certifications, coaching, online learning platforms subscriptions, pass-it-on sessions, workshops, conferences;
Work-Life Balance: Hybrid work and flexible working hours, employee assistance programme;
Health: Global internal wellbeing programme, access to wellbeing apps;
Community: Global internal tech communities, hobby clubs and interest groups, inclusion and diversity programmes, events and celebrations.

Additional Local Benefits

Monthly Lifestyle Allowance: Contribution towards health and wellbeing activities like gym memberships.
Novated Leasing: Pre-tax car leasing benefit for new and used cars.
Loyalty Leave: Receive an additional day of leave on your 3rd, 4th, and 5th work anniversaries, accumulating up to a maximum of 3 extra days of leave per year.
Inclusive Parental Leave Policy: 12 weeks of primary carer’s leave and 4 weeks of secondary carer’s leave.
Work From Anywhere: In addition to our hybrid working policy, we also offer 20 days of working from anywhere per year. Ideal for an extended trip to get away from the city or visiting loved ones.