Senior Kubernetes Platform Engineer

42 Minutes ago • 5 Years +

Job Summary

Job Description

TensorWave is seeking a Senior Kubernetes Platform Engineer to design and deploy scalable bare-metal Kubernetes clusters for their AI compute cloud platform. The role involves leading ingress/egress solutions, contributing to multi-tenant environment designs, and driving observability improvements. Candidates need expertise in Linux, networking, Kubernetes internals, and infrastructure-as-code to empower cutting-edge AI advancements.
Must have:
  • Design and deploy bare-metal Kubernetes clusters at scale using RKE2
  • Collaborate on architectural improvements, infrastructure planning, and automation
  • Lead the design and implementation of Ingress and Egress traffic solutions
  • Contribute to multi-tenant environment designs including VPC-level isolation
  • Drive continuous improvement around observability using Prometheus
  • Serve as a subject matter expert in core Linux, networking, and Kubernetes internals
  • Collaborate cross-functionally with AI platform teams and customers
  • 5+ years of experience in infrastructure or DevOps engineering roles
  • 3+ years hands-on experience managing Kubernetes in bare-metal environments
  • Proven expertise in designing multi-tenant Kubernetes clusters with strong network isolation
  • Deep understanding of Linux systems internals, networking (IPTables, CNI plugins, BGP), and DNS
  • Experience with ingress controllers, load balancing, and service mesh
  • Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible
  • Experience monitoring Kubernetes workloads with Prometheus and related observability tools
Good to have:
  • Familiarity with RKE2, Rancher, or other downstream Kubernetes distributions
  • Exposure to AI/ML infrastructure workloads or GPU resource scheduling
  • Experience in infrastructure compliance or secure multi-tenancy (e.g., PCI, SOC2)
Perks:
  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance
  • Life and Voluntary Supplemental Insurance
  • Short Term Disability Insurance
  • Flexible Spending Account
  • 401(k)
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Mental Health Benefits through Spring Health

Job Details

At TensorWave, we're leading the charge in AI compute, building a versatile cloud platform that's driving the next generation of AI innovation. We're focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what's possible in the AI landscape.

Responsibilities:

  • Design and deploy bare-metal Kubernetes clusters at scale using RKE2
  • Collaborate with senior engineers on architectural improvements, infrastructure planning, and automation
  • Lead the design and implementation of Ingress and Egress traffic solutions, leveraging HAProxy, Cilium, and other components
  • Contribute to multi-tenant environment designs including VPC-level isolation, network policy enforcement, and secure shared services
  • Drive continuous improvement around observability using Prometheus and related tooling
  • Serve as a subject matter expert in core Linux, networking, and Kubernetes internals
  • Collaborate cross-functionally with AI platform teams and internal/external customers

Required Skills & Experience:

  • 5+ years of experience in infrastructure or DevOps engineering roles
  • 3+ years hands-on experience managing Kubernetes in bare-metal environments
  • Proven expertise in designing multi-tenant Kubernetes clusters with strong network isolation
  • Deep understanding of Linux systems internals, networking (IPTables, CNI plugins, BGP), and DNS
  • Experience with ingress controllers, load balancing, and service mesh (e.g., HAProxy, Cilium, Envoy)
  • Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible
  • Experience monitoring Kubernetes workloads with Prometheus and related observability tools

Nice to Have:

  • Familiarity with RKE2, Rancher, or other downstream Kubernetes distributions
  • Exposure to AI/ML infrastructure workloads or GPU resource scheduling
  • Experience in infrastructure compliance or secure multi-tenancy (e.g., PCI, SOC2)

What We Bring:

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance
  • Life and Voluntary Supplemental Insurance
  • Short Term Disability Insurance
  • Flexible Spending Account
  • 401(k)
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Mental Health Benefits through Spring Health

Apply for this Job

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Las Vegas, Nevada, USA

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

Las Vegas, Nevada, United States (On-Site)

View All Jobs

Get notified when new jobs are added by TensorWave

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug