As a Lead Data Engineer specialising in Databricks, you will design, build, and optimise data pipeline solutions on Databricks and related cloud platforms. Working closely with data scientists, analysts, and engineers, you will ensure our data infrastructure supports advanced analytics and business insights across industries (including energy, resources, and mining). You will join a collaborative, agile team where continuous improvement, innovation, and knowledge sharing are part of the culture.
- Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver effective pipeline solutions.
- Contribute to Data Architecture and Solution Design, helping to build Proof of Concepts.
- Design, develop, and maintain robust ETL/ELT pipelines on using Databricks along with AWS / Azure / GCP tools and services, to ingest, process, and transform large datasets.
- Implement data validation, cleansing, and governance procedures to guarantee data quality, integrity, and security. This includes enforcing data standards and addressing data quality issues proactively.
- Continuously improve the scalability, efficiency, and cost-effectiveness of data pipelines. Identify opportunities to enhance performance, reliability, and cost-efficiency across our data systems.
- Monitor data pipeline performance and promptly troubleshoot any issues or failures to ensure high data availability and consistency. Leverage observability tools and best practices to maintain reliable pipelines.
- Develop streaming or event-driven data processes as needed for real-time analytics, leveraging frameworks like Apache Kafka and Spark Structured Streaming.
- Maintain clear documentation of data pipelines, data models, and processes for transparency and team knowledge sharing. Follow best practices in coding, testing, and version control to ensure maintainable and auditable workflows.
Qualifications
- Proficiency in Python for data engineering (including PySpark and libraries like pandas/Polars) and in SQL for data querying and transformation.
- Solid understanding of data warehousing concepts and dimensional data modeling (e.g. star schema, Kimball methodology).
- Hands-on experience with relational database systems and SQL (e.g. SQL Server, PostgreSQL) and familiarity with NoSQL databases (e.g. MongoDB, Cassandra) for varied data storage needs.
- Strong experience designing and implementing ETL/ELT processes and integrating data from multiple sources.
- Proven experience working with multiple cloud data platforms such as AWS / Azure / GCP.
- Expertise in Databricks and the Spark ecosystem for large-scale data processing is required.
- Familiarity with data pipeline orchestration and automation tools and with CI/CD pipelines for deploying data workflows.
- Experience monitoring data pipeline performance and using observability tools to ensure data reliability is a plus.
- Experience with designing and building event-driven architectures and streaming data tools (such as Apache Kafka or Spark Streaming) is beneficial for handling real-time data flows.
- Experience working in Agile teams with iterative development, and a collaborative approach to problem-solving.
- Holding a current Databricks certification (e.g. Databricks Certified Data Engineer) is a strong advantage.
- Background in or understanding of data from the energy, resources, or mining industry is a plus, as it will help in delivering business-focused insights in these sectors.
Additional Information
Discover some of the global benefits that empower our people to become the best version of themselves:
- Finance: Competitive salary package, share plan, company performance bonuses, value-based recognition awards, referral bonus;
- Career Development: Career coaching, global career opportunities, non-linear career paths, internal development programmes for management and technical leadership;
- Learning Opportunities: Complex projects, rotations, internal tech communities, training, certifications, coaching, online learning platforms subscriptions, pass-it-on sessions, workshops, conferences;
- Work-Life Balance: Hybrid work and flexible working hours, employee assistance programme;
- Health: Global internal wellbeing programme, access to wellbeing apps;
- Community: Global internal tech communities, hobby clubs and interest groups, inclusion and diversity programmes, events and celebrations.
Additional Local Benefits
- Monthly Lifestyle Allowance: Contribution towards health and wellbeing activities like gym memberships.
- Novated Leasing: Pre-tax car leasing benefit for new and used cars.
- Loyalty Leave: Receive an additional day of leave on your 3rd, 4th, and 5th work anniversaries, accumulating up to a maximum of 3 extra days of leave per year.
- Inclusive Parental Leave Policy: 12 weeks of primary carer’s leave and 4 weeks of secondary carer’s leave.
- Work From Anywhere: In addition to our hybrid working policy, we also offer 20 days of working from anywhere per year. Ideal for an extended trip to get away from the city or visiting loved ones.