Python Spark Developer

Synechron

3+ Years | Mumbai, Maharashtra, India (On Site) | Full Time | 1 day ago

Apply Now

Job Summary

Synechron is seeking a skilled Python Spark Developer to design and optimize large-scale data pipelines and processing systems. The successful candidate will leverage expertise in Python and Apache Spark to build scalable, high-performance data workflows, supporting enterprise analytics, fraud detection, and real-time data applications. This role is instrumental in driving data architecture advancements, operational excellence, and delivering solutions aligned with business and technical standards.

Must Have

3+ years of professional experience in Python development with a focus on data engineering and Big Data processing
Hands-on expertise with Apache Spark (preferably Spark 2.x or 3.x) in batch and streaming environments
Strong SQL skills with experience working with relational and distributed data systems (e.g., Hive, Snowflake, NoSQL databases)
Experience with data pipeline orchestration and management tools (e.g., Airflow, Jenkins, Git)
Solid understanding of software engineering principles, clean code practices, and design patterns
Familiarity with system design for scalable, data-intensive applications
Proven experience designing scalable, reliable ETL/ELT workflows in enterprise environments
Demonstrated ability to optimize Spark jobs for performance in batch and streaming scenarios
Experience working in distributed system architectures with a focus on data security and compliance

Good to Have

Exposure to cloud data platforms such as Snowflake, Databricks, AWS Glue, or GCP DataProc
Experience working with Kafka, Redis, or similar messaging systems
Knowledge of observability tools like OpenTelemetry, Grafana, Loki, Tempo
Understanding of containerization using Docker, orchestration with Kubernetes, and GitOps workflows
Exposure to Scala, Java for integration purposes
Experience with Flink, or other streaming frameworks
Knowledge of NoSQL databases (MongoDB, Cassandra) and Data Lake architectures
Experience with infrastructure automation (Terraform, CloudFormation)
Background in financial, fraud detection, or data-intensive environments

Job Description

Job Summary

Software Requirements

Required Skills:

3+ years of professional experience in Python development with a focus on data engineering and Big Data processing
Hands-on expertise with Apache Spark (preferably Spark 2.x or 3.x) in batch and streaming environments
Strong SQL skills with experience working with relational and distributed data systems (e.g., Hive, Snowflake, NoSQL databases)
Experience with data pipeline orchestration and management tools (e.g., Airflow, Jenkins, Git)
Solid understanding of software engineering principles, clean code practices, and design patterns
Familiarity with system design for scalable, data-intensive applications

Preferred Skills:

Exposure to cloud data platforms such as Snowflake, Databricks, AWS Glue, or GCP DataProc
Experience working with Kafka, Redis, or similar messaging systems
Knowledge of observability tools like OpenTelemetry, Grafana, Loki, Tempo
Understanding of containerization using Docker, orchestration with Kubernetes, and GitOps workflows

Overall Responsibilities

Design, develop, and optimize scalable data pipelines and workflows utilizing Python and Apache Spark
Build high-performance data processing applications emphasizing pushdown optimization, partitioning, clustering, and streaming
Integrate modern data platforms and tools into existing enterprise architectures for improved data accessibility and security
Engineer feature pipelines to support real-time fraud detection and other critical analytics systems
Define data models and processing strategies aligned with distributed architecture principles to ensure scalability and consistency
Develop solutions that are production-ready, maintainable, and feature observability and operational monitoring capabilities
Adhere to clean code standards, SOLID principles, and architecture best practices to enable extensibility and robustness
Participate in code reviews, testing, deployment, and performance tuning activities
Contribute to architectural governance, innovation initiatives, and continuous improvement efforts

Technical Skills (By Category)

Programming Languages:

Essential: Python (version 3.7+)
Preferred: Scala, Java for integration purposes

Frameworks & Libraries:

Essential: Apache Spark, Spark Streaming, Spark SQL, PySpark
Preferred: Kafka clients, Flink, or other streaming frameworks

Data & Databases:

Essential: SQL (PostgreSQL, MySQL), Spark dataframes, Hive, or similar distributed storage
Preferred: NoSQL databases (MongoDB, Cassandra), Data Lake architectures

Cloud & Infrastructure:

Preferred: Cloud platforms such as Snowflake, Databricks, AWS, or GCP
Experience with containerization: Docker, Kubernetes, Helm
Infrastructure automation: Terraform, CloudFormation (desirable)

DevOps & Monitoring:

Essential: CI/CD (Jenkins, GitHub Actions), observability tools (OpenTelemetry, Prometheus, Grafana)
Preferred: Log aggregation tools like Loki, Tempo; metrics collection

Experience Requirements

3+ years of hands-on experience developing data pipelines in Python with Apache Spark
Proven experience designing scalable, reliable ETL/ELT workflows in enterprise environments
Demonstrated ability to optimize Spark jobs for performance in batch and streaming scenarios
Experience working in distributed system architectures with a focus on data security and compliance
Background in financial, fraud detection, or data-intensive environments is preferred; relevant industry experience is desirable
Proven ability to collaborate across cross-functional teams and influence technical decision-making

Day-to-Day Activities

Develop and maintain large-scale data pipelines supporting enterprise analytics and real-time applications
Optimize Spark jobs and workflows for throughput, latency, and resource utilization
Implement pushdown optimizations, partitioning strategies, and clustering techniques to improve data processing efficiency
Collaborate with data architects, platform teams, and stakeholders to evaluate new tools and platforms for data solutions
Troubleshoot technical issues, resolve data pipeline failures, and improve system observability
Conduct code reviews and participate in agile planning, deployment, and operational activities
Document architecture, processes, and best practices to facilitate knowledge sharing and operational excellence
Stay current with industry trends, emerging tools, and best practices in big data engineering

Qualifications

Bachelor’s or Master’s degree in Computer Science, Software Engineering, Data Science, or related field
Additional certifications in Big Data, Spark, or cloud data services are a plus
Extensive hands-on experience developing large-scale data pipelines and processing solutions with Python and Apache Spark

Professional Competencies

Strong analytical and problem-solving skills for complex data workflows
Excellent collaboration and communication skills with technical and non-technical stakeholders
Ability to lead initiatives, influence best practices, and mentor junior engineers
Adaptability to evolving technologies and organizational needs
Focus on operational excellence, observability, and sustained performance
Commitment to continuous learning and process improvement

32 Skills Required For This Role

Team Management Cross Functional Communication Data Analytics Design Patterns Github Game Texts Agile Development Postgresql Mysql Aws Nosql Prometheus Terraform Grafana Helm Spark Data Science Redis Mongodb Ci Cd Cassandra Docker Kubernetes Git Python Sql Scala Github Actions Jenkins Java System Design

Similar Jobs

Python Spark Developer

Job Summary

Must Have

Good to Have

Job Description

Job Summary

Software Requirements

Required Skills:

Preferred Skills:

Overall Responsibilities

Technical Skills (By Category)

Programming Languages:

Frameworks & Libraries:

Data & Databases:

Cloud & Infrastructure:

DevOps & Monitoring:

Experience Requirements

Day-to-Day Activities

Qualifications

Professional Competencies

32 Skills Required For This Role

Similar Jobs

Programming

Software Development & Engineering