As the Data Engineer Tech Lead, you will be responsible for leading a team of data engineers and overseeing the architecture, development, and optimization of data platforms using Python, Databricks, and cloud-based technologies. You will collaborate with cross-functional teams to ensure data accuracy, scalability, and performance while developing innovative solutions for data processing, analytics, and reporting.
Team Leadership: Lead and mentor a team of data engineers, providing guidance on best practices in data engineering, code reviews, and design patterns.
Data Pipeline Development: Design, develop, and maintain scalable and efficient data pipelines using Python and Databricks on cloud platforms like AWS, Azure, or GCP.
ETL Processes: Architect and build robust ETL (Extract, Transform, Load) processes to gather, clean, and process large datasets from various data sources.
Data Platform Management: Oversee the management of the data platform, ensuring data integrity, performance optimization, and scalability.
Collaboration: Work closely with data scientists, analysts, and business teams to gather data requirements and translate them into efficient data solutions.
Performance Optimization: Optimize data workflows and Databricks clusters for performance, ensuring minimal latency and maximum efficiency in data processing.
Cloud Integration: Manage cloud-based data infrastructure, implementing best practices for security, scaling, and cost management in cloud environments.
Data Quality & Governance: Ensure data accuracy, consistency, and quality across all pipelines by implementing data validation checks and governance policies.
Automation & CI/CD: Automate data workflows, integrate CI/CD pipelines, and ensure reliable data processing through scheduling, monitoring, and alerting mechanisms.
Documentation: Create and maintain comprehensive documentation of data workflows, pipelines, architecture, and best practices.
Must have
10+ years of experience in data engineering, with at least 2+ years in a lead role.
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
Strong expertise in Python and experience with Databricks or similar big data platforms plus Azure Data Factory is mandatory.
Solid experience in cloud-based platforms such as AWS, Azure, or Google Cloud, especially with managed data services like Azure Data Lake, AWS S3, Databricks, etc.
Strong understanding of data modeling principles, including data warehousing and relational databases.
Proficiency in building ETL pipelines for batch and real-time data processing.
Hands-on experience with big data technologies (Spark, Hadoop, Kafka, etc.).
Knowledge of working with distributed systems and processing large datasets efficiently.
Familiarity with SQL and non-SQL databases (e.g., PostgreSQL, Cassandra, MongoDB).
Experience with CI/CD pipelines and automation tools for data engineering.
Strong understanding of DevOps and DataOps principles.
Excellent communication, leadership, and problem-solving skills.
Nice to have
Experience with Delta Lake, Lakehouse architecture, or similar data architectures.
Experience with machine learning platforms and integrating data pipelines with ML workflows.
Knowledge of Terraform, Kubernetes, or other infrastructure-as-code tools for cloud infrastructure automation.
Experience in implementing data governance frameworks and compliance with GDPR or CCPA.
Familiarity with Agile methodologies and project management tools such as Jira.
Languages
English: B2 Upper Intermediate
Seniority
Senior