Technical Lead Data Engineer - Python

10 Months ago • 10 Years +
Devops

Job Description

As the Data Engineer Tech Lead, you will lead a team, overseeing data platform architecture, development, and optimization using Python, Databricks, and cloud technologies (AWS, Azure, or GCP). Responsibilities include designing and building ETL processes, managing data platforms, collaborating with cross-functional teams, optimizing performance, ensuring data quality and governance, automating workflows, and creating documentation. You will mentor engineers, implement CI/CD, and work with big data technologies like Spark and Hadoop. The role requires strong Python, Databricks, and cloud expertise.
Good To Have:
  • Delta Lake, Lakehouse architecture
  • ML platform integration
  • Terraform, Kubernetes
  • Data governance frameworks (GDPR, CCPA)
  • Agile methodologies
Must Have:
  • 10+ years data engineering experience (2+ years lead role)
  • Python, Databricks expertise
  • Cloud platform experience (AWS, Azure, GCP)
  • ETL pipeline development
  • Big data technologies (Spark, Hadoop)
  • SQL & NoSQL database knowledge
  • CI/CD and automation
  • Excellent communication & leadership

Add these skills to join the top 1% applicants for this job

python
azure
ci-cd
aws
sql
scalability
jira
cassandra
kubernetes
mongodb
spark
terraform
hadoop
design-patterns
postgresql

Project description

As the Data Engineer Tech Lead, you will be responsible for leading a team of data engineers and overseeing the architecture, development, and optimization of data platforms using Python, Databricks, and cloud-based technologies. You will collaborate with cross-functional teams to ensure data accuracy, scalability, and performance while developing innovative solutions for data processing, analytics, and reporting.

Responsibilities

Team Leadership: Lead and mentor a team of data engineers, providing guidance on best practices in data engineering, code reviews, and design patterns.

Data Pipeline Development: Design, develop, and maintain scalable and efficient data pipelines using Python and Databricks on cloud platforms like AWS, Azure, or GCP.

ETL Processes: Architect and build robust ETL (Extract, Transform, Load) processes to gather, clean, and process large datasets from various data sources.

Data Platform Management: Oversee the management of the data platform, ensuring data integrity, performance optimization, and scalability.

Collaboration: Work closely with data scientists, analysts, and business teams to gather data requirements and translate them into efficient data solutions.

Performance Optimization: Optimize data workflows and Databricks clusters for performance, ensuring minimal latency and maximum efficiency in data processing.

Cloud Integration: Manage cloud-based data infrastructure, implementing best practices for security, scaling, and cost management in cloud environments.

Data Quality & Governance: Ensure data accuracy, consistency, and quality across all pipelines by implementing data validation checks and governance policies.

Automation & CI/CD: Automate data workflows, integrate CI/CD pipelines, and ensure reliable data processing through scheduling, monitoring, and alerting mechanisms.

Documentation: Create and maintain comprehensive documentation of data workflows, pipelines, architecture, and best practices.

Skills

Must have

10+ years of experience in data engineering, with at least 2+ years in a lead role.

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).

Strong expertise in Python and experience with Databricks or similar big data platforms plus Azure Data Factory is mandatory.

Solid experience in cloud-based platforms such as AWS, Azure, or Google Cloud, especially with managed data services like Azure Data Lake, AWS S3, Databricks, etc.

Strong understanding of data modeling principles, including data warehousing and relational databases.

Proficiency in building ETL pipelines for batch and real-time data processing.

Hands-on experience with big data technologies (Spark, Hadoop, Kafka, etc.).

Knowledge of working with distributed systems and processing large datasets efficiently.

Familiarity with SQL and non-SQL databases (e.g., PostgreSQL, Cassandra, MongoDB).

Experience with CI/CD pipelines and automation tools for data engineering.

Strong understanding of DevOps and DataOps principles.

Excellent communication, leadership, and problem-solving skills.

Nice to have

Experience with Delta Lake, Lakehouse architecture, or similar data architectures.

Experience with machine learning platforms and integrating data pipelines with ML workflows.

Knowledge of Terraform, Kubernetes, or other infrastructure-as-code tools for cloud infrastructure automation.

Experience in implementing data governance frameworks and compliance with GDPR or CCPA.

Familiarity with Agile methodologies and project management tools such as Jira.

Other

Languages

English: B2 Upper Intermediate

Seniority

Senior

Set alerts for more jobs like Technical Lead Data Engineer - Python
Set alerts for new jobs by Luxoft
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙