MLOps Engineer - ML Platform

1 Day ago • 4 Years + • Research Development • $134,800 PA - $202,200 PA

Job Summary

Job Description

We are seeking a highly skilled and experienced MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform both on premises and AWS Cloud. You will be responsible for architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform. Your expertise in AWS services will be crucial in ensuring the smooth operation and scalability of our ML infrastructure. You will work closely with cross-functional teams to ensure efficient training and deployment of ML models.
Must have:
  • Architect, develop, and maintain the ML platform to support training and inference of ML models.
  • Design and implement scalable and reliable infrastructure solutions for NVIDIA clusters both on premises and AWS Cloud.
  • Collaborate with data scientists, software engineers, and infrastructure specialists to define requirements and ensure seamless integration of ML and Data workflows into the platform.
  • Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment.
  • Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform.
  • Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow.
  • Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the platform.
  • Manage AWS services including EKS, EC2, VPC, IAM, S3, and EFS to support the platform.
  • Implement logging and monitoring solutions using AWS CloudWatch and other relevant tools.
  • Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Proven experience as an MLOps Engineer or similar role, with a focus on large-scale ML and/or Data infrastructure and GPU clusters.
  • Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads.
  • Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana.
  • Solid programming skills in languages like Python, Go and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch).
  • In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques.
  • Familiarity with containerization technologies such as Docker and orchestration tools.
  • Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD).
  • Experience with AWS services such as EKS, EC2, VPC, IAM, S3, and EFS.
  • Experience with AWS logging and monitoring tools.
  • Minimum 4+ years of Software Engineering or related work experience (with Bachelor's degree).
  • Minimum 2+ years of work experience with Programming Language such as C, C++, Java, Python.
Good to have:
  • Experience with training and deploying models.
  • Knowledge of ML model optimization techniques and memory management on GPUs.
  • Familiarity with ML-specific data storage and retrieval systems.
  • Understanding of security and compliance requirements in ML infrastructure.
Perks:
  • World-class health benefit option providing world-class coverage to employees and their eligible dependents.
  • Programs designed to help employees build and prepare for a financially secure future.
  • Self and family resources to build emotional/mental strength and resilience, as well as define your purpose.
  • Wellbeing programs and resources to help employees Live+Well and Work+Well.
  • Competitive annual discretionary bonus program.
  • Opportunity for annual RSU grants.
  • Continuous learning and development programs.
  • Tuition reimbursement.
  • Mentorships.

Job Details

Job Description

Job Posting Date

2025-09-12

---

Company:

Qualcomm Technologies, Inc.

Job Area:

Engineering Group, Engineering Group > Software Engineering

General Summary:

We are seeking a highly skilled and experienced MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform both on premises and AWS Cloud. As a MLOps Engineer, you will be responsible for architecting, deploying, and optimizing the ML & Data platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana. Your expertise in AWS services such as EKS, EC2, VPC, IAM, S3, and EFS will be crucial in ensuring the smooth operation and scalability of our ML infrastructure.

You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps, DevOps, and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models.

Responsibilities will include:

  • Architect, develop, and maintain the ML platform to support training and inference of ML models.
  • Design and implement scalable and reliable infrastructure solutions for NVIDIA clusters both on premises and AWS Cloud.
  • Collaborate with data scientists and software engineers to define requirements and ensure seamless integration of ML and Data workflows into the platform.
  • Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment.
  • Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform.
  • Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow.
  • Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the platform.
  • Manage AWS services including EKS, EC2, VPC, IAM, S3, and EFS to support the platform.
  • Implement logging and monitoring solutions using AWS CloudWatch and other relevant tools.
  • Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform.

What are we looking for:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Proven experience as an MLOps Engineer or similar role, with a focus on large-scale ML and/or Data infrastructure and GPU clusters.
  • Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads.
  • Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana.
  • Solid programming skills in languages like Python, Go and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch).
  • In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques.
  • Familiarity with containerization technologies such as Docker and orchestration tools.
  • Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD).
  • Experience with AWS services such as EKS, EC2, VPC, IAM, S3, and EFS.
  • Experience with AWS logging and monitoring tools.
  • Strong problem-solving skills and the ability to troubleshoot complex technical issues.
  • Excellent communication and collaboration skills to work effectively within a cross-functional team.

We would love to see:

  • Experience with training and deploying models.
  • Knowledge of ML model optimization techniques and memory management on GPUs.
  • Familiarity with ML-specific data storage and retrieval systems.
  • Understanding of security and compliance requirements in ML infrastructure.

Minimum Qualifications:

  • Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience.
  • OR Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience.
  • OR PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience.
  • 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc.

Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).

To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.

EEO Employer: Qualcomm is an equal opportunity employer; all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or any other protected classification.

Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.

Pay range and Other Compensation & Benefits:

$134,800.00 - $202,200.00

The above pay scale reflects the broad, minimum to maximum, pay scale for this job code for the location for which it has been posted. Even more importantly, please note that salary is only one component of total compensation at Qualcomm. We also offer a competitive annual discretionary bonus program and opportunity for annual RSU grants (employees on sales-incentive plans are not eligible for our annual bonus). In addition, our highly competitive benefits package is designed to support your success at work, at home, and at play. Your recruiter will be happy to discuss all that Qualcomm has to offer – and you can review more details about our US benefits at this link.

If you would like more information about this role, please contact Qualcomm Careers.

Perks and Benefits

Health

Qualcomm offers a world-class health benefit option providing world-class coverage to employees and their eligible dependents.

Wealth

Our programs are designed to help employees build and prepare for a financially secure future.

Self

Our self and family resources help you build emotional/mental strength and resilience, as well as define your purpose — in life and at work.

Wellbeing

Qualcomm’s wellbeing programs and resources offer support to help employees Live+Well and Work+Well, so they can unlock their full potential at home, at work, and everywhere between.

Unlock Your Limitless Potential with Qualcomm

Whether you’re launching a new career or ready to explore what’s next in the evolution of your talent and expertise, you’re about to embark on a career growth journey like no other.

Bring out your best, with the best

Our employees make Qualcomm’s success possible. We hire the brightest minds and foster a supportive, inclusive culture where your ideas have the power to contribute to world-changing innovations and breakthrough technologies. To make that possible, we leverage the breadth and depth of our diverse expertise from around the world to answer the unasked, conquer the complex, and solve some of the biggest challenges only we can – together.

Innovate with technology experts

At Qualcomm, we are passionate about the limitless potential of your career. Only here can you work alongside some of the most respected, leading engineering and technology experts in the industry – helping you learn and grow professionally in ways you haven’t yet imagined.

Live well, work well

Additionally, you’ll have access to programs such as our continuous learning and development programs, tuition reimbursement, and mentorships to tap into your limitless potential – plus, opportunities to enhance your quality of life through our comprehensive, best-in-class benefits offerings.

The work we do at Qualcomm impacts lives around the globe – and you can be part of it. Apply today and unlock your full potential.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in San Diego, California, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Research Development Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Our employees make Qualcomm’s success possible. We hire the brightest minds and foster a supportive, inclusive culture where your ideas have the power to contribute to world-changing innovations and breakthrough technologies. To make that possible, we leverage the breadth and depth of our diverse expertise from around the world to answer the unasked, conquer the complex, and solve some of the biggest challenges only we can – together.

San Diego, California, United States (On-Site)

San Diego, California, United States (On-Site)

Bridgewater, New Jersey, United States (Remote)

San Diego, California, United States (On-Site)

San Diego, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

San Diego, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Qualcomm

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug
Contact Us
hello@outscal.com
Made in INDIA 💛💙