AI Cloud Engineer

4 Hours ago • All levels
Research Development

Job Description

This role involves designing and building Kubernetes-native AI cloud systems for deploying and managing large-scale AI services. Key responsibilities include implementing core cloud features like dynamic workload scheduling, logging, monitoring, authentication, high availability, and resolving performance bottlenecks. The engineer will also collaborate with internal and external teams to deliver and maintain cloud-native solutions and contribute to future product enhancements.
Good To Have:
  • Exceptional problem-solving skills, with a proactive and analytical approach.
  • Certified Kubernetes Administrator (CKA) certification.
  • 3-5 years of direct experience building commercial services and infrastructure, including creating Kubernetes operators or custom controllers.
  • Familiarity with AI/ML specific orchestration tools built atop Kubernetes (e.g., Kubeflow, Ray, Argo).
Must Have:
  • Design and build Kubernetes-native AI cloud systems tailored for massive-scale AI services.
  • Implement core cloud features: dynamic workload scheduling, logging/monitoring/metering, authentication/authorization, high availability, QoS, and failover.
  • Identify and resolve performance bottlenecks and operational issues affecting cluster stability and availability.
  • Work closely with customers and internal teams to deliver and maintain cloud-native systems.
  • Bachelor’s or higher degree in Computer Science, Electrical Engineering, or a related technical field.
  • Proven, hands-on experience designing and operating large-scale Kubernetes clusters in a production environment.
  • Strong proficiency in production-quality systems code using Python, Go, or C++.
  • Experience in full-stack development.

Add these skills to join the top 1% applicants for this job

cpp
game-texts
kubernetes
python

Responsibilities and Opportunities

  • Design and build a Kubernetes-native AI cloud system, specifically tailored for deploying and managing massive-scale, performance AI services
  • Implement core cloud features – dynamic workload scheduling, logging/monitoring/metering, authentication/authorization, high availability, QoS, and failover – for our internal platform or customer-facing solutions
  • Identify and resolve performance bottlenecks and operational issues that affect cluster stability and availability
  • Work closely with customers and internal teams to deliver and maintain cloud-native systems and help shape future product enhancements and capabilities

Key Qualifications

  • Bachelor’s or higher degree in Computer Science, Electrical Engineering, or a related technical field
  • Proven, hands-on experience designing and operating large-scale Kubernetes clusters in a production environment
  • Strong proficiency in production-quality systems code using Python, Go, or C++
  • Experience in full-stack development

Ideal Qualifications

  • Exceptional problem-solving skills, with a proactive and analytical approach to technical challenges
  • Certified Kubernetes Administrator (CKA) certification
  • 3-5 years of direct experience building commercial services and infrastructure, including creating Kubernetes operators or custom controllers
  • Familiarity with AI/ML specific orchestration tools built atop Kubernetes (e.g., Kubeflow, Ray, Argo)

Set alerts for more jobs like AI Cloud Engineer
Set alerts for new jobs by Rebellions
Set alerts for new Research Development jobs in South Korea
Set alerts for new jobs in South Korea
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙