This role combines machine learning expertise with SRE responsibilities. You will manage Azure infrastructure for AI model development and deployment, ensure model performance, and respond to incidents related to model operations. This position requires strong Azure infrastructure, CI/CD, containerization, and machine learning knowledge.
Must have:
Azure Infrastructure Experience
CI/CD Pipeline Experience
Containerization in the Cloud
Machine Learning Expertise
Programming Skills
Data Management
Collaborative Team Player
Documentation
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.
Description
Manage Azure Infrastructure: Configure, maintain, and optimize Azure infrastructure for AI model development and deployment, ensuring scalability and performance.
Model Performance Monitoring: Implement and maintain monitoring systems to track model performance, proactively identifying and addressing issues as they arise.
Incident Response: Collaborate with the SRE team to respond promptly to outages and incidents related to model operations, ensuring minimal downtime and rapid issue resolution.
Requirements
Azure Infrastructure Experience: Proficiency in managing Azure infrastructure components, including virtual machines, storage, and networking, to support AI model development and deployment.
CI/CD Pipeline Experience: Experience with Continuous Integration/Continuous Deployment (CI/CD) pipelines, including the automation of model deployment processes.
Containerization in the Cloud: Strong knowledge of containerization technologies in the cloud, such as Docker and Kubernetes, for efficient deployment and scaling of machine learning models.
Machine Learning Expertise: Proficient in building and optimizing machine learning models, with a deep understanding of various ML algorithms and frameworks.
Programming Skills: Proficiency in programming languages commonly used in machine learning, such as Python and libraries like TensorFlow and PyTorch.
Data Management: Experience in data preprocessing, feature engineering, and data pipeline development for machine learning.
Collaborative Team Player: Excellent communication skills and the ability to work collaboratively with cross-functional teams, including AI engineers and SREs.
Documentation: Effective documentation skills to maintain clear and organized records of models, infrastructure configurations, and incident responses.
View Full Job Description
Add your resume
80%
Upload your resume, increase your shortlisting chances by 80%