MLOps Engineer – Vertex AI Specialist
LTI Mindtree
Job Summary
The MLOps Engineer – Vertex AI Specialist will build and maintain end-to-end MLOps pipelines using Vertex AI, Kubeflow, and related GCP services. This role involves optimizing training/inference, setting up CI/CD with Cloud Build, and monitoring deployed models. Candidates need 5-10 years in software/data engineering, with 3+ years in MLOps on GCP, and strong skills in ML lifecycle, DevOps, and cloud-native technologies.
Must Have
- Build and maintain end-to-end MLOps pipelines using Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Model Monitoring
- Implement Kubeflow Pipelines for scalable and reproducible ML workflows
- Utilize Managed Notebooks and Vertex AI Datasets for data preprocessing and model development
- Optimize training and inference using GPU accelerators and CUDA
- Set up Cloud Build for automated CI/CD of ML models and manage artifacts via Artifact Registry
- Monitor deployed models in real time using Vertex AI Monitoring and implement drift detection strategies
- Collaborate with data scientists and DevOps teams to streamline deployment and lifecycle management
- Ensure operational excellence through data versioning, model version control, and infrastructure automation
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field
- 5–10 years of experience in software/data engineering
- 3+ years in MLOps or ML engineering on GCP using Vertex AI
- Hands-on experience deploying ML models in production environments
- Strong understanding of ML lifecycle, DevOps, and cloud-native technologies
- Excellent scripting and automation skills
- Proficiency in MLOps Tools: MLflow, DVC, Kubeflow, TFX
- Experience with CI/CD & DevOps: Jenkins, GitHub Actions, Docker, Kubernetes, Azure DevOps, Cloud Build
- Programming skills: Python, ML pipeline automation, Bash, YAML
- Cloud Platforms: Azure ML, AWS SageMaker, GCP Vertex AI
- Monitoring & Logging: Prometheus, Grafana, ELK/EFK stack
- Model Deployment: FastAPI, Flask, REST APIs, containerized delivery
- Data Engineering: Airflow, Spark, Delta Lake, feature stores
- Security & Compliance: IAM, Key Vault, encryption, GDPR/SOC2
Good to Have
- AutoML Tools: Azure AutoML, H2O.ai, Google AutoML
- Edge AI: Deployment on IoT/edge devices
- Visualization: Power BI, Streamlit, Dash
- Domain Exposure: Manufacturing, BFSI, Healthcare, Retail
- Responsible AI: Model explainability, fairness, bias detection
Job Description
Roles & Responsibilities
- Build and maintain end-to-end MLOps pipelines using Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Model Monitoring.
- Implement Kubeflow Pipelines for scalable and reproducible ML workflows.
- Utilize Managed Notebooks and Vertex AI Datasets for data preprocessing and model development.
- Optimize training and inference using GPU accelerators and CUDA.
- Set up Cloud Build for automated CI/CD of ML models and manage artifacts via Artifact Registry.
- Monitor deployed models in real time using Vertex AI Monitoring and implement drift detection strategies.
- Collaborate with data scientists and DevOps teams to streamline deployment and lifecycle management.
- Ensure operational excellence through data versioning, model version control, and infrastructure automation.
Experience & Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
- 5–10 years of experience in software/data engineering, with 3+ years in MLOps or ML engineering on GCP using Vertex AI.
- Hands-on experience deploying ML models in production environments.
- Strong understanding of ML lifecycle, DevOps, and cloud-native technologies.
- Excellent scripting and automation skills.
Primary Skills (Mandatory)
- MLOps Tools: MLflow, DVC, Kubeflow, TFX
- CI/CD & DevOps: Jenkins, GitHub Actions, Docker, Kubernetes, Azure DevOps, Cloud Build
- Programming: Python, ML pipeline automation, Bash, YAML
- Cloud Platforms: Azure ML, AWS SageMaker, GCP Vertex AI
- Monitoring & Logging: Prometheus, Grafana, ELK/EFK stack
- Model Deployment: FastAPI, Flask, REST APIs, containerized delivery
- Data Engineering: Airflow, Spark, Delta Lake, feature stores
- Security & Compliance: IAM, Key Vault, encryption, GDPR/SOC2
Secondary Skills (Good to Have)
- AutoML Tools: Azure AutoML, H2O.ai, Google AutoML
- Edge AI: Deployment on IoT/edge devices
- Visualization: Power BI, Streamlit, Dash
- Domain Exposure: Manufacturing, BFSI, Healthcare, Retail
- Responsible AI: Model explainability, fairness, bias detection
23 Skills Required For This Role
Github
Game Texts
Cuda
Yaml
Aws
Azure
Prometheus
Azure Devops
Grafana
Elk
Power Bi
Spark
Model Deployment
Fastapi
Automl
Ci Cd
Docker
Flask
Kubernetes
Python
Github Actions
Bash
Jenkins