This Data Engineer role focuses on designing, developing, deploying, and maintaining data pipelines for a data lake. Responsibilities include ETL processes, data quality monitoring, metadata management, and optimizing data lake performance. The role involves building a scalable data lake ecosystem, establishing a data catalog, implementing data lineage tracking, and collaborating with data scientists, business analysts, and data stewards. The ideal candidate will have strong experience with big data technologies (Hadoop, Spark), cloud platforms (Azure, AWS), SQL/NoSQL databases, and data governance practices.
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.
Company Description
Bosch Global Software Technologies Private Limited is a 100% owned subsidiary of Robert Bosch GmbH, one of the world's leading global supplier of technology and services, offering end-to-end Engineering, IT and Business Solutions. With over 22,700 associates, it’s the largest software development center of Bosch, outside Germany, indicating that it is the Technology Powerhouse of Bosch in India with a global footprint and presence in the US, Europe and the Asia Pacific region.
Job Description
Requirements:
Master’s or a bachelor’s degree in computer science engineering, or a related field. Proven experience as a Data Engineer, Big Data Engineer, or related role with a focus on data lake development and management.
Hands-on experience with big data technologies such as Hadoop, Spark or equivalent.
Proficiency in programming languages like Python, Golang or Java for ETL and data processing.
Familiarity with cloud-based data lake solutions (Azure Data Lake Storage, AWS S3) and cloud computing platforms (Databricks).·
Strong understanding of data modeling, ontologies, data warehousing, and data integration concepts.
Experience with data governance, data security, and data privacy practices. Excellent analytical and problem-solving skills.
Strong communication and collaboration abilities to work effectively in a team-oriented environment.
Deep experience with SQL and NoSQL databases. Certification in relevant big data technologies and cloud platforms.
Responsibilities:
As a data engineer, you are involved in the design, development, deployment and ongoing operations of data pipelines to bring data from multiple data sources into a data-lake using tools for extract, transform and load operations.
You are responsible for ensuring data quality monitoring and the proper management of metadata to support business objectives.
Scalable Datalake:
You will play a crucial role in building a scalable and efficient data lake ecosystem that enables the project to store, manage, and analyze vast volumes of structured and unstructured data.
As part of data lake optimization, you will be responsible for monitoring the performance of the data lake, identifying and solving bottlenecks as well as optimizing data storage and processing.
Data Catalogue: ·
Establishing and maintaining the data catalog and best practices for data management to facilitate data discovery and understanding.
Implement data lineage tracking to ensure data traceability and reliability.
Collaboration: ·
You would get to work closely with other data engineers, data scientists, business analysts to ensure efficient functioning of the data lake.
Collaborate with data stewards to enforce data quality standards and ensure data consistency across the data lake. Participate in data governance initiatives and support data compliance efforts.
Communicate effectively with stakeholders to gather requirements and provide updates on data lake implementation.
Document data lake processes, configurations, and best practices for future reference and knowledge sharing.
Cloud management:
Abilityto provision and manage cloud resources and have strong DevOps experience with Terraform in Microsoft Azure