Python Spark Developer
Synechron
Job Summary
Synechron is seeking a skilled Python Spark Developer to design and optimize large-scale data pipelines and processing systems. The successful candidate will leverage expertise in Python and Apache Spark to build scalable, high-performance data workflows, supporting enterprise analytics, fraud detection, and real-time data applications. This role is instrumental in driving data architecture advancements, operational excellence, and delivering solutions aligned with business and technical standards.
Must Have
- 3+ years of professional experience in Python development with a focus on data engineering and Big Data processing
- Hands-on expertise with Apache Spark (preferably Spark 2.x or 3.x) in batch and streaming environments
- Strong SQL skills with experience working with relational and distributed data systems (e.g., Hive, Snowflake, NoSQL databases)
- Experience with data pipeline orchestration and management tools (e.g., Airflow, Jenkins, Git)
- Solid understanding of software engineering principles, clean code practices, and design patterns
- Familiarity with system design for scalable, data-intensive applications
- Proven experience designing scalable, reliable ETL/ELT workflows in enterprise environments
- Demonstrated ability to optimize Spark jobs for performance in batch and streaming scenarios
- Experience working in distributed system architectures with a focus on data security and compliance
Good to Have
- Exposure to cloud data platforms such as Snowflake, Databricks, AWS Glue, or GCP DataProc
- Experience working with Kafka, Redis, or similar messaging systems
- Knowledge of observability tools like OpenTelemetry, Grafana, Loki, Tempo
- Understanding of containerization using Docker, orchestration with Kubernetes, and GitOps workflows
- Exposure to Scala, Java for integration purposes
- Experience with Flink, or other streaming frameworks
- Knowledge of NoSQL databases (MongoDB, Cassandra) and Data Lake architectures
- Experience with infrastructure automation (Terraform, CloudFormation)
- Background in financial, fraud detection, or data-intensive environments
Job Description
Job Summary
Synechron is seeking a skilled Python Spark Developer to design and optimize large-scale data pipelines and processing systems. The successful candidate will leverage expertise in Python and Apache Spark to build scalable, high-performance data workflows, supporting enterprise analytics, fraud detection, and real-time data applications. This role is instrumental in driving data architecture advancements, operational excellence, and delivering solutions aligned with business and technical standards.
Software Requirements
Required Skills:
- 3+ years of professional experience in Python development with a focus on data engineering and Big Data processing
- Hands-on expertise with Apache Spark (preferably Spark 2.x or 3.x) in batch and streaming environments
- Strong SQL skills with experience working with relational and distributed data systems (e.g., Hive, Snowflake, NoSQL databases)
- Experience with data pipeline orchestration and management tools (e.g., Airflow, Jenkins, Git)
- Solid understanding of software engineering principles, clean code practices, and design patterns
- Familiarity with system design for scalable, data-intensive applications
Preferred Skills:
- Exposure to cloud data platforms such as Snowflake, Databricks, AWS Glue, or GCP DataProc
- Experience working with Kafka, Redis, or similar messaging systems
- Knowledge of observability tools like OpenTelemetry, Grafana, Loki, Tempo
- Understanding of containerization using Docker, orchestration with Kubernetes, and GitOps workflows
Overall Responsibilities
- Design, develop, and optimize scalable data pipelines and workflows utilizing Python and Apache Spark
- Build high-performance data processing applications emphasizing pushdown optimization, partitioning, clustering, and streaming
- Integrate modern data platforms and tools into existing enterprise architectures for improved data accessibility and security
- Engineer feature pipelines to support real-time fraud detection and other critical analytics systems
- Define data models and processing strategies aligned with distributed architecture principles to ensure scalability and consistency
- Develop solutions that are production-ready, maintainable, and feature observability and operational monitoring capabilities
- Adhere to clean code standards, SOLID principles, and architecture best practices to enable extensibility and robustness
- Participate in code reviews, testing, deployment, and performance tuning activities
- Contribute to architectural governance, innovation initiatives, and continuous improvement efforts
Technical Skills (By Category)
Programming Languages:
- Essential: Python (version 3.7+)
- Preferred: Scala, Java for integration purposes
Frameworks & Libraries:
- Essential: Apache Spark, Spark Streaming, Spark SQL, PySpark
- Preferred: Kafka clients, Flink, or other streaming frameworks
Data & Databases:
- Essential: SQL (PostgreSQL, MySQL), Spark dataframes, Hive, or similar distributed storage
- Preferred: NoSQL databases (MongoDB, Cassandra), Data Lake architectures
Cloud & Infrastructure:
- Preferred: Cloud platforms such as Snowflake, Databricks, AWS, or GCP
- Experience with containerization: Docker, Kubernetes, Helm
- Infrastructure automation: Terraform, CloudFormation (desirable)
DevOps & Monitoring:
- Essential: CI/CD (Jenkins, GitHub Actions), observability tools (OpenTelemetry, Prometheus, Grafana)
- Preferred: Log aggregation tools like Loki, Tempo; metrics collection
Experience Requirements
- 3+ years of hands-on experience developing data pipelines in Python with Apache Spark
- Proven experience designing scalable, reliable ETL/ELT workflows in enterprise environments
- Demonstrated ability to optimize Spark jobs for performance in batch and streaming scenarios
- Experience working in distributed system architectures with a focus on data security and compliance
- Background in financial, fraud detection, or data-intensive environments is preferred; relevant industry experience is desirable
- Proven ability to collaborate across cross-functional teams and influence technical decision-making
Day-to-Day Activities
- Develop and maintain large-scale data pipelines supporting enterprise analytics and real-time applications
- Optimize Spark jobs and workflows for throughput, latency, and resource utilization
- Implement pushdown optimizations, partitioning strategies, and clustering techniques to improve data processing efficiency
- Collaborate with data architects, platform teams, and stakeholders to evaluate new tools and platforms for data solutions
- Troubleshoot technical issues, resolve data pipeline failures, and improve system observability
- Conduct code reviews and participate in agile planning, deployment, and operational activities
- Document architecture, processes, and best practices to facilitate knowledge sharing and operational excellence
- Stay current with industry trends, emerging tools, and best practices in big data engineering
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, Data Science, or related field
- Additional certifications in Big Data, Spark, or cloud data services are a plus
- Extensive hands-on experience developing large-scale data pipelines and processing solutions with Python and Apache Spark
Professional Competencies
- Strong analytical and problem-solving skills for complex data workflows
- Excellent collaboration and communication skills with technical and non-technical stakeholders
- Ability to lead initiatives, influence best practices, and mentor junior engineers
- Adaptability to evolving technologies and organizational needs
- Focus on operational excellence, observability, and sustained performance
- Commitment to continuous learning and process improvement