Job Description
Overview: We are looking for an experienced Data Engineer specializing in Generative AI to join our inaugural AI pod, with the mission to establish a robust AI infrastructure. This role is pivotal in setting up multiple AI pods and promoting an AI-driven culture across the organization. The ideal candidate will work in close collaboration with Data Scientists and Applied Scientists and will own the integration of AI models into our business processes, particularly in the healthcare sector.
Responsibilities:
- Data Pipeline Development: Design, develop, and maintain scalable data pipelines and ETL processes. Lead data curation, data readiness, MLOps towards a comprehensive data strategy and execution.
- Data Infrastructure Management: Build and manage data infrastructure to support AI and machine learning initiatives.
- Collaboration: Work closely with Data Scientists and Applied Scientists to understand data requirements and ensure data availability and quality.
- Integration Ownership: Own the integration of AI models and solutions into existing business applications.
- Data Governance: Implement data governance and security best practices to ensure data integrity and compliance.
- Performance Optimization: Optimize data storage and retrieval mechanisms to enhance performance and efficiency.
- Documentation: Develop and maintain documentation related to data architecture, processes, and workflows.
- Monitoring and Troubleshooting: Monitor and troubleshoot data-related issues, ensuring minimal disruption to AI operations.
- Support Deployment: Support the deployment and refinement of large language models and specialized AI models such as OpenAI's GPT, Claude, or Gemini.
- Feedback Integration: Collect and analyze feedback from AI solutions to continuously improve system performance.
- Cloud Deployment: Deploy AI solutions to AWS cloud platform and integrate them with existing applications.
- Healthcare Innovation: Innovate to solve and simplify clinical workflows and improve provider satisfaction.
- Emerging Technologies: Stay abreast of emerging technologies and trends in data engineering and AI.
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 8 to 10 years of experience in data engineering, including building and managing data pipelines.
- Experience in supporting scalable AI solutions.
- Experience with vector stores and embedding models and their use in the context of generative AI.
- Proficiency in programming languages such as Python and SQL.
- Experience with AWS big data technologies (e.g., EMR, Redshift, Kinesis).
- Familiarity with data warehousing solutions (e.g., Redshift, BigQuery, Snowflake).
- Strong knowledge of AWS database management systems (e.g., RDS, DynamoDB, Aurora).
- Experience in generating sample data using statistical modeling techniques.
- Ability to collect feedback from AI solutions and iteratively improve systems.
- Experience in deploying AI solutions to AWS cloud platform and integrating them with applications.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Experience with CI/CD and DevOps practices is a plus.
- Experience in the healthcare industry is an added bonus, particularly in innovating to solve and simplify clinical workflows and improve provider satisfaction.