Senior Data Engineer
Cubic corporation
Job Summary
The Senior Data Engineer will design, build, and maintain scalable data pipelines, integrations, and analytical platforms for CTS's global analytics and reporting. This role involves developing high-quality ETL/ELT processes, managing cloud data environments, ensuring data quality, and enabling advanced analytics. The engineer will work with Azure Data Factory, Azure Databricks, Python, SQL (Oracle/SQL Server), data modelling, data lakes, and enterprise data warehouse structures to support KPI engines, device analytics, predictive modelling, and operational performance insights.
Must Have
- Design, build, and optimize data pipelines using Azure Data Factory and Databricks (pyspark/spark SQL).
- Develop robust ETL/ELT processes to ingest, transform, and validate large volumes of operational, telemetry, incident, and transactional data.
- Implement scalable workflows leveraging Azure services such as Data Lake Storage, SQL Databases, Key Vault, Logic Apps, and Functions.
- Develop clean, maintainable, and well-documented Python code for data processing, automation, and model-serving pipelines.
- Build efficient SQL queries and stored procedures across Oracle and SQL Server.
- Collaborate with data analysts, engineers, performance assurance, and operations teams.
- Design and maintain data models, schemas, tables, and metadata following COE architecture patterns.
- Implement strong data quality, validation, and monitoring frameworks.
- Support integration of ServiceNow, device telemetry feeds, GTFS, and other operational data sources.
- Optimize pipeline performance, troubleshoot failures, and ensure high availability and security compliance.
- Contribute to data engineering standards, best practices, reusable templates, and version control via Git.
- Bachelor’s/master’s degree in Comp Science, Data Analytics, Engineering, Mathematics, or related field.
- 5+ years of experience in data engineering roles.
- Strong expertise with Azure Data Factory (ADF) pipelines, triggers, mapping data flows, and orchestrations.
- Advanced experience with Azure Databricks (pyspark, spark SQL, Delta Lake, notebooks, clusters).
- High proficiency in Python for ETL, automation, and data transformation.
- Strong SQL skills across Oracle and SQL Server, including query tuning and complex transformations.
- Solid understanding of data modelling, warehousing, star/snowflake schema, and distributed data processing.
- Experience handling large-scale datasets and complex data domains.
- Familiarity with CI/CD practices, Git branching, and DevOps pipelines.
- Experience with data governance, data cataloging, and managing structured/unstructured data in cloud environments.
Good to Have
- Experience with Azure Databricks, Databricks Workflows, or Apache Airflow.
- Exposure to real-time/streaming data technologies (Event Hub, Kafka, Spark Streaming).
- Experience integrating APIs, flat files, and operational systems such as ServiceNow.
- Experience to machine learning pipelines, feature stores, or model operationalization.
- Familiarity with Power BI dataset design and optimization.
- Understanding of multi-region cloud environments and enterprise architecture patterns.
Job Description
Job Overview
Job Summary
The Senior Data Engineer will play a key role in designing, building, and maintaining scalable data pipelines, integrations, and analytical platforms that support CTS’s global analytics and reporting platforms. This role is responsible for developing high-quality ETL/ELT processes, managing cloud data environments, ensuring data quality, and enabling advanced analytics and reporting across major programs.
You will work extensively with Azure Data Factory, Azure Databricks, Python, SQL (Oracle/SQL Server), data modelling, data lakes, and enterprise data warehouse structures to support KPI engines, device analytics, predictive modelling, and operational performance insights.
Key Responsibilities
The following are the key responsibilities of the position. It is expected that most, if not all of these, are met by the candidate:
- Design, build, and optimize data pipelines using Azure Data Factory and Databricks (pyspark/spark SQL).
- Develop robust ETL/ELT processes to ingest, transform, and validate large volumes of operational, telemetry, incident, and transactional data.
- Implement scalable workflows leveraging Azure services such as Data Lake Storage, SQL Databases, Key Vault, Logic Apps, and Functions.
- Develop clean, maintainable, and well-documented Python code for data processing, automation, and model-serving pipelines.
- Build efficient SQL queries and stored procedures across Oracle and SQL Server to support the ODS, EDW, and analytics layer.
- Collaborate with data analysts, engineers, performance assurance, and operations teams to enable reliable, accurate datasets for reporting, KPIs and predictive analytics.
- Design and maintain data models, schemas, tables, and metadata following COE architecture patterns.
- Implement strong data quality, validation, and monitoring frameworks to ensure accuracy and reliability across global programs.
- Support integration of ServiceNow, device telemetry feeds, GTFS, and other operational data sources into cloud pipelines.
- Optimize pipeline performance, troubleshoot failures, and ensure high availability and security compliance.
- Contribute to data engineering standards, best practices, reusable templates, and version control via Git.
Required Skills and Qualifications
- Bachelor’s/master’s degree in Comp Science, Data Analytics, Engineering, Mathematics, or related field.
- 5+ years of experience in data engineering roles.
- Strong expertise with Azure Data Factory (ADF) pipelines, triggers, mapping data flows, and orchestrations.
- Advanced experience with Azure Databricks (pyspark, spark SQL, Delta Lake, notebooks, clusters).
- High proficiency in Python for ETL, automation, and data transformation.
- Strong SQL skills across Oracle and SQL Server, including query tuning and complex transformations.
- Solid understanding of data modelling, warehousing, star/snowflake schema, and distributed data processing.
- Experience handling large-scale datasets and complex data domains.
- Familiarity with CI/CD practices, Git branching, and DevOps pipelines.
- Experience with data governance, data cataloging, and managing structured/unstructured data in cloud environments.
Preferred Skills
- Experience with Azure Databricks, Databricks Workflows, or Apache Airflow.
- Exposure to real-time/streaming data technologies (Event Hub, Kafka, Spark Streaming).
- Experience integrating APIs, flat files, and operational systems such as ServiceNow.
- Experience to machine learning pipelines, feature stores, or model operationalization.
- Familiarity with Power BI dataset design and optimization.
- Understanding of multi-region cloud environments and enterprise architecture patterns.
Soft Skills
- Strong analytical mindset with exceptional problem-solving abilities.
- Ability to work independently and take ownership of deliverables.
- Excellent communication skills-capable of simplifying complex concepts for non-technical stakeholders.
- Comfortable working in a fast-paced, global, multi-time zone team.
- High attention to detail, data quality, and accuracy.
- Strong sense of accountability, adaptability, and continuous improvement.
- Ability to engage effectively with engineering, operations, and business stakeholders.
- Proactive, resourceful, and committed to delivering high-quality outcomes.
Role Impact & Success Measures
The Senior Data Analyst will directly support CTS’s global analytics capability by delivering high-quality insights, strengthening KPI/SLA measurement, and enabling data-driven decision-making across major transit programs. Success in this role is defined by strong analytical delivery, reliable data models, and meaningful contributions to operational and predictive initiatives.
Success in the First 3–6 Months
- Build a solid understanding of CTS datasets, KPI frameworks, and key operational systems.
- Gain strong understanding of CTS datasets, pipelines, and Azure architecture.
- Deliver reliable ADF and Databricks pipelines for ingestion and transformation.
- Improve data quality and performance of existing workflows.
- Support analysts and operations teams with accurate, well-structured datasets.
Success in 6–12 Months
- Take ownership of end-to-end analytical workstreams with minimal supervision.
- Own end-to-end pipeline development with minimal supervision.
- Implement scalable, reusable engineering patterns and metadata-driven frameworks.
- Improve efficiency through automation, orchestration, and optimization.
- Integrate new data sources to support KPIs, device analytics, and predictive modelling.
Long-Term Success (12+ Months)
- Contribute to CTS Analytics strategy, standards, and global operating model improvements.
- Contribute to global COE standards for data engineering and cloud architecture.
- Become SME for key pipelines, data domains, or Azure components.
- Lead continuous improvement initiatives around quality, governance, and automation.
- Support junior engineers and uplift engineering capability within the COE.