Senior Data Engineer
Blis
Job Summary
Blis is seeking an experienced Senior Data Engineer to build secure, automated, and scalable data pipelines on GCP. The role involves working with high-scale systems, processing petabytes of data, and tackling challenges in data science disciplines like classification and optimization. Responsibilities include designing, building, monitoring, and supporting large-scale data processing pipelines, mentoring team members, and exploring new data streams for commercial and technical growth. The ideal candidate will be proficient in Python, cloud engineering, and data processing, with a focus on efficient, production-level solutions.
Must Have
- Design, build, monitor, and support large scale data processing pipelines.
- Support, mentor, and pair with other members of the team.
- Help Blis explore and exploit new data streams.
- Work closely with Product and be comfortable with taking, making and delivering against fast paced decisions.
- 5+ years direct experience delivering robust performant data pipelines.
- Proven experience in architecting, developing, and maintaining Apache Druid and Imply platforms.
- Mastery of building Pipelines in GCP maximising the use of native and native supporting technologies e.g. Apache Airflow.
- Mastery of Python for data and computational tasks with fluency in data cleansing, validation and composition techniques.
- Hands-on implementation and architectural familiarity with streaming data, relational and non-relational databases, and distributed processing technologies (e.g. Spark).
- Fluency with all appropriate python libraries typical of data science e.g. pandas, scikit-learn, scipy, numpy, MLlib and/or other machine learning and statistical libraries.
- Advanced knowledge of cloud based services specifically GCP.
- Excellent working understanding of server-side Linux.
- Professional in managing and updating on tasks ensuring appropriate levels of documentation, testing and assurance around their solutions.
Good to Have
- Experience optimizing both code and config in Spark, Hive, or similar tools.
- Practical experience working with relational databases, including advanced operations such as partitioning and indexing.
- Knowledge and experience with tools like AWS Athena or Google BigQuery to solve data-centric problems.
- Understanding and ability to innovate, apply, and optimize complex algorithms and statistical techniques to large data structures.
- Experience with Python Notebooks, such as Jupyter, Zeppelin, or Google Datalab to analyze, prototype, and visualize data and algorithmic output.
Job Description
Come work on fantastically high-scale systems with us! Blis is an award-winning, global leader and technology innovator in big data analytics and advertising. We help brands such as McDonald's, Samsung, and Mercedes Benz to understand and effectively reach their best audiences.
We are looking for solid and experienced Data Engineers to work on building out secure, automated, scalable pipelines on GCP. We receive over 350gb of data an hour and respond to 400,000 decision requests each second, with petabytes of analytical data to work with.
We tackle challenges across almost every major discipline of data science, including classification, clustering, optimisation, and data mining. You will be responsible for building stable production level pipelines maximising the efficiency of cloud compute to ensure that data is properly enabled for operational and scientific cause.
This is a growing team with big responsibilities and exciting challenges ahead of it, as we look to reach the next 10x level of scale and intelligence.
At Blis, Data Engineers are a combination of software engineers, cloud engineers, and data processing engineers. They actively design and build production pipeline code, typically in Python, whilst having practical experience in ensuring, policing, and measuring for good data governance, quality, and efficient consumption. To run an efficient landscape we are ideally looking for candidates that are comfortable with event- driven automation across also aspects of our operational pipelines.
As a Blis data engineer, we seek to understand the data and problem definition and find efficient solutions, so critical thinking is a key component to efficient pipelines and effective reuse, this must include defining the pipelines for the correct controls and recovery points not only function and scale. The team are almost always adherents of Lean Development and work well in environments with significant amounts of freedom and ambitious goals.
Shift: 12 pm - 8 pm (IST)
Location: Mumbai (Hybrid - 3 days onsite)
Key responsibilities
- Design, build, monitor, and support large scale data processing pipelines.
- Support, mentor, and pair with other members of the team to advance our team’s capabilities and capacity.
- Help Blis explore and exploit new data streams to innovative and support commercial and technical growth
- Work closely with Product and be comfortable with taking, making and delivering against fast paced decisions to delight our customers.
This ideal candidate will be comfortable with fast feature delivery with a robust engineered follow up.
Skills and requirements
- 5+ years direct experience delivering robust performant data pipelines within the constraints of direct SLA’s and commercial financial footprints.
- Proven experience in architecting, developing, and maintaining Apache Druid and Imply platforms, with a focus on DevOps practices and large-scale system re-architecture
- Mastery of building Pipelines in GCP maximising the use of native and native supporting technologies e.g. Apache Airflow
- Mastery of Python for data and computational tasks with fluency in data cleansing, validation and composition techniques.
- Hands-on implementation and architectural familiarity with all forms of data sourcing i.e streaming data, relational and non-relational databases, and distributed processing technologies (e.g. Spark)
- Fluency with all appropriate python libraries typical of data science e.g. pandas, scikit-learn, scipy, numpy, MLlib and/or other machine learning and statistical libraries
- Advanced knowledge of cloud based services specifically GCP
- Excellent working understanding of server-side Linux
- Professional in managing and updating on tasks ensuring appropriate levels of documentation, testing and assurance around their solutions.
**
Desired
- Experience optimizing both code and config in Spark, Hive, or similar tools
- Practical experience working with relational databases, including advanced operations such as partitioning and indexing
- Knowledge and experience with tools like AWS Athena or Google BigQuery to solve data-centric problems
- Understanding and ability to innovate, apply, and optimize complex algorithms and statistical techniques to large data structures
Experience with Python Notebooks, such as Jupyter, Zeppelin, or Google Datalab
to analyze, prototype, and visualize data and algorithmic output