Job Summary
As a Data Engineer, you will play a crucial role in the development, deployment, and maintenance of our data infrastructure. You will be responsible for designing, constructing, and maintaining scalable data pipelines and architectures to support our data-driven initiatives. Your work will enable efficient data extraction, transformation, and loading (ETL) processes, ensuring the availability, integrity, and usability of our data assets.
Job Responsibilities
Data Pipeline Development: Design, build, and maintain robust, scalable data pipelines to collect, process, and store large volumes of structured and unstructured data from various sources.
Data Modeling and Architecture: Design data models and architecture to optimize data storage, retrieval, and analysis. Ensure data integrity, consistency, and reliability across different systems.
Data Integration: Implement ETL (Extract, Transform, Load) processes to integrate data from diverse sources into centralized data repositories or data warehouses. Develop efficient data transformation workflows to cleanse, enrich, and aggregate data.
Data Infrastructure Management: Manage and optimize data infrastructure components such as databases, data lakes, and distributed computing frameworks. Monitor system performance, troubleshoot issues, and ensure high availability and scalability of data platforms.
Data Quality Assurance: Develop and implement data quality standards, validation rules, and monitoring processes to ensure the accuracy, completeness, and consistency of data. Perform data profiling and analysis to identify data quality issues and recommend improvements.
Collaboration and Communication: Collaborate with cross-functional teams including data scientists, analysts, and software engineers to understand data requirements, provide technical expertise, and support data-driven decision-making. Communicate effectively with stakeholders to convey technical concepts and project updates.
Continuous Improvement: Stay updated with emerging technologies, tools, and best practices in data engineering. Continuously evaluate and enhance data processes, infrastructure, and architecture to improve efficiency, performance, and scalability.
Other Knowledge, Skills, and Abilities
Strong analytical and problem-solving skills with attention to detail.
Excellent communication and collaboration skills to work effectively with cross-functional teams.
Ability to prioritize tasks, manage workload efficiently, and meet deadlines in a fast-paced environment.
Self-motivated with a passion for learning and staying updated with industry trends and technologies.
Advanced English (Must)
Years of Experience:
Proven experience (2-3+ years) in data engineering roles, with a focus on designing and implementing data pipelines and ETL processes.
Education/Certifications:
Bachelor’s degree in computer science, Engineering, or related field. Master's degree preferred.
Relevant certifications such as AWS Certified Big Data - Specialty, Google Professional Data Engineer, or similar are a plus.
Specific Course/Certifications
Proficiency in programming languages such as Python, Java, or Scala, and experience with data processing frameworks like Apache Spark, Apache Flink, or similar.
Hands-on experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services (e.g., S3, Redshift, BigQuery).
Experience with relational and NoSQL databases, data modeling, and SQL query optimization.
Familiarity with DevOps practices and tools for continuous integration and deployment (CI/CD).