Senior Kubernetes Platform Engineer

1 Month ago • 5 Years +
Devops

Job Description

TensorWave is seeking a Senior Kubernetes Platform Engineer to design and deploy scalable bare-metal Kubernetes clusters for their AI compute cloud platform. The role involves leading ingress/egress solutions, contributing to multi-tenant environment designs, and driving observability improvements. Candidates need expertise in Linux, networking, Kubernetes internals, and infrastructure-as-code to empower cutting-edge AI advancements.
Good To Have:
  • Familiarity with RKE2, Rancher, or other downstream Kubernetes distributions
  • Exposure to AI/ML infrastructure workloads or GPU resource scheduling
  • Experience in infrastructure compliance or secure multi-tenancy (e.g., PCI, SOC2)
Must Have:
  • Design and deploy bare-metal Kubernetes clusters at scale using RKE2
  • Collaborate on architectural improvements, infrastructure planning, and automation
  • Lead the design and implementation of Ingress and Egress traffic solutions
  • Contribute to multi-tenant environment designs including VPC-level isolation
  • Drive continuous improvement around observability using Prometheus
  • Serve as a subject matter expert in core Linux, networking, and Kubernetes internals
  • Collaborate cross-functionally with AI platform teams and customers
  • 5+ years of experience in infrastructure or DevOps engineering roles
  • 3+ years hands-on experience managing Kubernetes in bare-metal environments
  • Proven expertise in designing multi-tenant Kubernetes clusters with strong network isolation
  • Deep understanding of Linux systems internals, networking (IPTables, CNI plugins, BGP), and DNS
  • Experience with ingress controllers, load balancing, and service mesh
  • Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible
  • Experience monitoring Kubernetes workloads with Prometheus and related observability tools
Perks:
  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance
  • Life and Voluntary Supplemental Insurance
  • Short Term Disability Insurance
  • Flexible Spending Account
  • 401(k)
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Mental Health Benefits through Spring Health

Add these skills to join the top 1% applicants for this job

game-texts
networking
dns
linux
service-mesh
load-balancing
prometheus
ansible
terraform
rancher
helm
kubernetes

At TensorWave, we're leading the charge in AI compute, building a versatile cloud platform that's driving the next generation of AI innovation. We're focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what's possible in the AI landscape.

Responsibilities:

  • Design and deploy bare-metal Kubernetes clusters at scale using RKE2
  • Collaborate with senior engineers on architectural improvements, infrastructure planning, and automation
  • Lead the design and implementation of Ingress and Egress traffic solutions, leveraging HAProxy, Cilium, and other components
  • Contribute to multi-tenant environment designs including VPC-level isolation, network policy enforcement, and secure shared services
  • Drive continuous improvement around observability using Prometheus and related tooling
  • Serve as a subject matter expert in core Linux, networking, and Kubernetes internals
  • Collaborate cross-functionally with AI platform teams and internal/external customers

Required Skills & Experience:

  • 5+ years of experience in infrastructure or DevOps engineering roles
  • 3+ years hands-on experience managing Kubernetes in bare-metal environments
  • Proven expertise in designing multi-tenant Kubernetes clusters with strong network isolation
  • Deep understanding of Linux systems internals, networking (IPTables, CNI plugins, BGP), and DNS
  • Experience with ingress controllers, load balancing, and service mesh (e.g., HAProxy, Cilium, Envoy)
  • Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible
  • Experience monitoring Kubernetes workloads with Prometheus and related observability tools

Nice to Have:

  • Familiarity with RKE2, Rancher, or other downstream Kubernetes distributions
  • Exposure to AI/ML infrastructure workloads or GPU resource scheduling
  • Experience in infrastructure compliance or secure multi-tenancy (e.g., PCI, SOC2)

What We Bring:

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance
  • Life and Voluntary Supplemental Insurance
  • Short Term Disability Insurance
  • Flexible Spending Account
  • 401(k)
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Mental Health Benefits through Spring Health

Apply for this Job

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Set alerts for more jobs like Senior Kubernetes Platform Engineer
Set alerts for new jobs by TensorWave
Set alerts for new Devops jobs in United States
Set alerts for new jobs in United States
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙