MLOps Engineer – Vertex AI Specialist

LTI Mindtree

Job Summary

The MLOps Engineer – Vertex AI Specialist will build and maintain end-to-end MLOps pipelines using Vertex AI, Kubeflow, and related GCP services. This role involves optimizing training/inference, setting up CI/CD with Cloud Build, and monitoring deployed models. Candidates need 5-10 years in software/data engineering, with 3+ years in MLOps on GCP, and strong skills in ML lifecycle, DevOps, and cloud-native technologies.

Must Have

  • Build and maintain end-to-end MLOps pipelines using Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Model Monitoring
  • Implement Kubeflow Pipelines for scalable and reproducible ML workflows
  • Utilize Managed Notebooks and Vertex AI Datasets for data preprocessing and model development
  • Optimize training and inference using GPU accelerators and CUDA
  • Set up Cloud Build for automated CI/CD of ML models and manage artifacts via Artifact Registry
  • Monitor deployed models in real time using Vertex AI Monitoring and implement drift detection strategies
  • Collaborate with data scientists and DevOps teams to streamline deployment and lifecycle management
  • Ensure operational excellence through data versioning, model version control, and infrastructure automation
  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field
  • 5–10 years of experience in software/data engineering
  • 3+ years in MLOps or ML engineering on GCP using Vertex AI
  • Hands-on experience deploying ML models in production environments
  • Strong understanding of ML lifecycle, DevOps, and cloud-native technologies
  • Excellent scripting and automation skills
  • Proficiency in MLOps Tools: MLflow, DVC, Kubeflow, TFX
  • Experience with CI/CD & DevOps: Jenkins, GitHub Actions, Docker, Kubernetes, Azure DevOps, Cloud Build
  • Programming skills: Python, ML pipeline automation, Bash, YAML
  • Cloud Platforms: Azure ML, AWS SageMaker, GCP Vertex AI
  • Monitoring & Logging: Prometheus, Grafana, ELK/EFK stack
  • Model Deployment: FastAPI, Flask, REST APIs, containerized delivery
  • Data Engineering: Airflow, Spark, Delta Lake, feature stores
  • Security & Compliance: IAM, Key Vault, encryption, GDPR/SOC2

Good to Have

  • AutoML Tools: Azure AutoML, H2O.ai, Google AutoML
  • Edge AI: Deployment on IoT/edge devices
  • Visualization: Power BI, Streamlit, Dash
  • Domain Exposure: Manufacturing, BFSI, Healthcare, Retail
  • Responsible AI: Model explainability, fairness, bias detection

Job Description

Roles & Responsibilities

  • Build and maintain end-to-end MLOps pipelines using Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Model Monitoring.
  • Implement Kubeflow Pipelines for scalable and reproducible ML workflows.
  • Utilize Managed Notebooks and Vertex AI Datasets for data preprocessing and model development.
  • Optimize training and inference using GPU accelerators and CUDA.
  • Set up Cloud Build for automated CI/CD of ML models and manage artifacts via Artifact Registry.
  • Monitor deployed models in real time using Vertex AI Monitoring and implement drift detection strategies.
  • Collaborate with data scientists and DevOps teams to streamline deployment and lifecycle management.
  • Ensure operational excellence through data versioning, model version control, and infrastructure automation.

Experience & Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
  • 5–10 years of experience in software/data engineering, with 3+ years in MLOps or ML engineering on GCP using Vertex AI.
  • Hands-on experience deploying ML models in production environments.
  • Strong understanding of ML lifecycle, DevOps, and cloud-native technologies.
  • Excellent scripting and automation skills.

Primary Skills (Mandatory)

  • MLOps Tools: MLflow, DVC, Kubeflow, TFX
  • CI/CD & DevOps: Jenkins, GitHub Actions, Docker, Kubernetes, Azure DevOps, Cloud Build
  • Programming: Python, ML pipeline automation, Bash, YAML
  • Cloud Platforms: Azure ML, AWS SageMaker, GCP Vertex AI
  • Monitoring & Logging: Prometheus, Grafana, ELK/EFK stack
  • Model Deployment: FastAPI, Flask, REST APIs, containerized delivery
  • Data Engineering: Airflow, Spark, Delta Lake, feature stores
  • Security & Compliance: IAM, Key Vault, encryption, GDPR/SOC2

Secondary Skills (Good to Have)

  • AutoML Tools: Azure AutoML, H2O.ai, Google AutoML
  • Edge AI: Deployment on IoT/edge devices
  • Visualization: Power BI, Streamlit, Dash
  • Domain Exposure: Manufacturing, BFSI, Healthcare, Retail
  • Responsible AI: Model explainability, fairness, bias detection

23 Skills Required For This Role

Github Game Texts Cuda Yaml Aws Azure Prometheus Azure Devops Grafana Elk Power Bi Spark Model Deployment Fastapi Automl Ci Cd Docker Flask Kubernetes Python Github Actions Bash Jenkins

Similar Jobs