About the job
SummaryBy Outscal
We are seeking a talented Machine Learning Engineer/SRE to manage Azure infrastructure, monitor model performance, and respond to incidents related to model operations. You should be proficient in managing Azure infrastructure, CI/CD pipelines, containerization, and machine learning models, along with strong programming skills and data management experience.
Description
- Manage Azure Infrastructure: Configure, maintain, and optimize Azure infrastructure for AI model development and deployment, ensuring scalability and performance.
- Model Performance Monitoring: Implement and maintain monitoring systems to track model performance, proactively identifying and addressing issues as they arise.
- Incident Response: Collaborate with the SRE team to respond promptly to outages and incidents related to model operations, ensuring minimal downtime and rapid issue resolution.
Requirements
- Azure Infrastructure Experience: Proficiency in managing Azure infrastructure components, including virtual machines, storage, and networking, to support AI model development and deployment.
- CI/CD Pipeline Experience: Experience with Continuous Integration/Continuous Deployment (CI/CD) pipelines, including the automation of model deployment processes.
- Containerization in the Cloud: Strong knowledge of containerization technologies in the cloud, such as Docker and Kubernetes, for efficient deployment and scaling of machine learning models.
- Machine Learning Expertise: Proficient in building and optimizing machine learning models, with a deep understanding of various ML algorithms and frameworks.
- Programming Skills: Proficiency in programming languages commonly used in machine learning, such as Python and libraries like TensorFlow and PyTorch.
- Data Management: Experience in data preprocessing, feature engineering, and data pipeline development for machine learning.
- Collaborative Team Player: Excellent communication skills and the ability to work collaboratively with cross-functional teams, including AI engineers and SREs.
- Documentation: Effective documentation skills to maintain clear and organized records of models, infrastructure configurations, and incident responses.
-