Machine Learning Operation Engineer

Morning Star

Job Summary

As a Machine Learning Operations Engineer, you will develop and maintain cutting-edge systems for AI products, focusing on production-grade ML infrastructure like inference endpoints, orchestration, data pipelines, and scalable APIs. You will apply a software development mindset to MLOps, ensuring testing, monitoring, documentation, and reliability, while understanding machine learning principles and LLM production trade-offs. Responsibilities include building and scaling inference endpoints, developing CI/CD pipelines on AWS, and integrating various data technologies.

Must Have

  • Develop and maintain cutting-edge systems for AI products
  • Design, deploy, and scale production-grade ML infrastructure
  • Build and scale inference endpoints and APIs for ML models and LLMs
  • Develop CI/CD pipelines and automate deployment on AWS
  • Design and maintain data pipelines, queues, and event-driven workflows
  • Integrate vector databases, MCP servers, and retrieval pipelines
  • Contribute to Python microservices and support the orchestrator layer
  • Ensure monitoring, observability, and cost-aware operation of ML services
  • Strong programming skills in Python
  • 3+ years experience in MLOps, backend, or data engineering
  • Good knowledge of ML principles
  • Solid knowledge of AWS services
  • Experience with CI/CD pipelines, Docker/Kubernetes
  • Understanding of microservices architectures, queues/events, scalability
  • Experience with SQL databases (PostgreSQL)
  • Good communication skills and a product-first mindset

Good to Have

  • Hands-on experience deploying and operating LLMs in production
  • Experience with JavaScript/TypeScript
  • Experience with Harness
  • Familiarity with retrieval-augmented generation (RAG), vector DBs
  • Monitoring/observability tools (CloudWatch, Prometheus, Grafana)
  • Infrastructure-as-code (Terraform, Cloudformation)
  • Experience with web crawlers or large-scale data ingestion

Perks & Benefits

  • Hybrid work environment (four days in-office each week in most locations)
  • Tools and resources for global collaboration
  • Range of other benefits to enhance flexibility

Job Description

About the Role

As a Machine Learning Operations Engineer, you will be responsible for developing and maintaining the cutting edge systems that bring our AI products to life.

You will design, deploy, and scale the systems that power our AI products, enabling investors worldwide to assess the Environmental, Social, and Governance (ESG) performance of companies. Your focus will be on production-grade ML infrastructure: inference endpoints, orchestration, data pipelines, and scalable APIs.

We are looking for engineers who bring a software development mindset into MLOps — testing, monitoring, documentation, and reliability — while also understanding machine learning principles and LLMs in production trade-offs.

Responsibilities

  • Build and scale inference endpoints and APIs for both classic ML models and LLMs.
  • Develop CI/CD pipelines and automate deployment on AWS (Bedrock, Lambda, EKS, S3, etc).
  • Design and maintain data pipelines, queues, and event-driven workflows.
  • Integrate vector databases, MCP servers, and retrieval pipelines into production systems.
  • Contribute to microservices in Python and support our orchestrator layer.
  • Ensure monitoring, observability, and cost-aware operation of deployed ML services.
  • Collaborate with AI researchers and software engineers to productize prototypes.

Qualifications

  • Strong programming skills in Python (APIs, pipelines, services).
  • 3+ years experience in MLOps, backend engineering, data engineering or related roles.
  • Good knowledge of ML principles (e.g. precision, recall, inference time, latency/throughput trade-offs).
  • Solid knowledge of AWS services (Bedrock, Lambda, EKS, S3, etc).
  • Experience with CI/CD pipelines, containerization (Docker/Kubernetes).
  • Understanding of microservices architectures, queues/events, and scalability.
  • Experience with SQL databases (PostgreSQL).
  • Good communication skills and a product-first mindset.

Nice to Have

  • Hands-on experience deploying and operating LLMs in production, with awareness of limitations, evaluation, and cost implications.
  • Experience with JavaScript/TypeScript
  • Experience with Harness
  • Familiarity with retrieval-augmented generation (RAG), vector DBs.
  • Monitoring/observability tools (CloudWatch, Prometheus, Grafana).
  • Infrastructure-as-code (Terraform, Cloudformation).
  • Experience with web crawlers or large-scale data ingestion.

Morningstar is an equal opportunity employer

Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.

I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity

17 Skills Required For This Role

Communication Game Texts Postgresql Prototyping Aws Prometheus Terraform Grafana Ci Cd Docker Microservices Kubernetes Python Sql Typescript Javascript Machine Learning

Similar Jobs