Lead Software Engineer : ML Ops & System Engineering

5 Minutes ago • 5 Years + • Research Development

Job Summary

Job Description

AppZen is seeking a Lead Software Engineer for ML Ops & System Engineering to design, build, and scale high-performance infrastructure. This role involves leading initiatives across software engineering, system reliability, and machine learning operations. The engineer will develop scalable microservices using Golang and Python, manage containerized environments with Docker and Kubernetes, implement CI/CD pipelines with Jenkins, and oversee ML workflows using MLflow. The position requires leveraging Temporal for complex workflow orchestration and working with AWS cloud services to deploy and manage infrastructure, ensuring robust, production-ready solutions.
Must have:
  • Design and develop scalable, secure, and reliable microservices using Golang and Python.
  • Build and maintain containerized environments using Docker and orchestrate with Kubernetes.
  • Implement CI/CD pipelines with Jenkins for automated testing, deployment, and monitoring.
  • Manage ML workflows with MLflow, ensuring reproducibility, versioning, and deployment of machine learning models.
  • Leverage Temporal for orchestrating complex workflows and ensuring fault-tolerant execution of distributed systems.
  • Work with AWS cloud services (EC2, S3, IAM, basics of networking) to deploy and manage scalable infrastructure.
  • Collaborate with data science and software teams to bridge the gap between ML research and production systems.
  • Ensure system reliability and observability through monitoring, logging, and performance optimization.
  • Mentor junior engineers and lead best practices for ML Ops, DevOps, and system design.
  • Minimum 5+ years of experience.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • Strong programming skills in Golang and Python.
  • Hands-on experience with Kubernetes and Docker in production environments.
  • Proven experience in microservices architecture and distributed systems design.
  • Good understanding of AWS fundamentals (EC2, S3, IAM, networking basics).
  • Experience with MLflow for ML model tracking, management, and deployment.
  • Proficiency in CI/CD tools (preferably Jenkins).
  • Knowledge of Temporal or similar workflow orchestration tools.
  • Strong problem-solving and debugging skills in distributed systems.
  • Excellent communication and leadership skills with experience mentoring engineers.

Job Details

AppZen is the leader in autonomous spend-to-pay software. Its patented artificial intelligence accurately and efficiently processes information from thousands of data sources so that organizations can better understand enterprise spend at scale to make smarter business decisions. It seamlessly integrates with existing accounts payable, expense, and card workflows to read, understand, and make real-time decisions based on your unique spend profile, leading to faster processing times and fewer instances of fraud or wasteful spend. Global enterprises, including one-third of the Fortune 500, use AppZen’s invoice, expense, and card transaction solutions to replace manual finance processes and accelerate the speed and agility of their businesses. To learn more, visit us at www.appzen.com

About the Role:

  • We are looking for a Lead Software Engineer with strong expertise in ML Ops, distributed systems, and platform engineering to design, build, and scale high-performance infrastructure. You will lead initiatives across software engineering, system reliability, and machine learning operations to deliver robust, production-ready solutions.

Key Responsibilities

  • Design & Develop scalable, secure, and reliable microservices using Golang and Python.
  • Build and maintain containerized environments using Docker and orchestrate them with Kubernetes.
  • Implement CI/CD pipelines with Jenkins for automated testing, deployment, and monitoring.
  • Manage ML workflows with MLflow, ensuring reproducibility, versioning, and deployment of machine learning models.
  • Leverage Temporal for orchestrating complex workflows and ensuring fault-tolerant execution of distributed systems.
  • Work with AWS cloud services (EC2, S3, IAM, basics of networking) to deploy and manage scalable infrastructure.
  • Collaborate with data science and software teams to bridge the gap between ML research and production systems.
  • Ensure system reliability and observability through monitoring, logging, and performance optimization.
  • Mentor junior engineers and lead best practices for ML Ops, DevOps, and system design.

Required Skills & Experience:

  • Minimum 5+ years of experience.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • Strong programming skills in Golang and Python.
  • Hands-on experience with Kubernetes and Docker in production environments.
  • Proven experience in microservices architecture and distributed systems design.
  • Good understanding of AWS fundamentals (EC2, S3, IAM, networking basics).
  • Experience with MLflow for ML model tracking, management, and deployment.
  • Proficiency in CI/CD tools (preferably Jenkins).
  • Knowledge of Temporal or similar workflow orchestration tools.
  • Strong problem-solving and debugging skills in distributed systems.
  • Excellent communication and leadership skills with experience mentoring engineers.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Pune, Maharashtra, India

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Research Development Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Pune, Maharashtra, India (On-Site)

Pune, Maharashtra, India (On-Site)

Pune, Maharashtra, India (On-Site)

Pune, Maharashtra, India (On-Site)

London, England, United Kingdom (On-Site)

San Jose, California, United States (On-Site)

San Jose, California, United States (On-Site)

Phoenix, Arizona, United States (Hybrid)

Pune, Maharashtra, India (On-Site)

View All Jobs

Get notified when new jobs are added by appzen

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug
Contact Us
hello@outscal.com
Made in INDIA 💛💙