Senior Software Engineer - Managed Kubernetes

2 Months ago • 6 Years + • Devops • $255,000 PA - $405,000 PA

Job Summary

Job Description

Lambda is seeking a Senior Software Engineer to join their Managed Kubernetes (Mk8s) team. The role involves designing, building, and maintaining scalable control plane services, operators, and custom Kubernetes controllers. Responsibilities include developing automation in Python/Go for cluster lifecycle management, identifying gaps and creating internal tools and APIs for customers, writing resilient systems for distributed environments, developing automated tests, and supporting production issues through on-call rotation. The role requires a focus on shaping the architecture, reliability, and automation of Kubernetes-based infrastructure.
Must have:
  • 6+ years software engineering experience
  • 3+ years leading complex projects
  • 2+ years in orchestration/deployment systems
  • Kubernetes and third-party operator experience
  • Strong Go and Python skills
  • Experience with infrastructure-as-code
  • Solid Linux, networking, containers knowledge
Good to have:
  • Deep Kubernetes and Linux expertise
  • Experience operating Kubernetes control plane
  • Experience with user-level restrictions
  • Experience with HPC clusters
  • Experience with ML/AI frameworks
  • Expertise in hybrid/multi-cloud Kubernetes
  • Familiarity with GPUs on Kubernetes
  • Contributions to CNCF projects
Perks:
  • Generous cash & equity compensation
  • Health, dental, and vision coverage
  • Wellness and Commuter stipends
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan

Job Details

Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. Lambda’s product portfolio includes on-prem GPU systems, hosted GPUs across public & private clouds and managed inference services – servicing government, researchers, startups and Enterprises world-wide.


If you'd like to build the world's best deep learning cloud, join us. 

*Note: This position requires presence in our San Francisco office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance.

About the Role

We are seeking a Senior Software Engineer to join our Managed Kubernetes (Mk8s) team. You will play a crucial role in shaping the architecture, reliability, and automation of our Kubernetes-based infrastructure, which powers mission-critical workloads across our global platform.


What You’ll Do

  • Design, build, and maintain scalable control plane services, operators, and custom Kubernetes controllers, while developing automation in Python/Go for end-to-end cluster lifecycle management — including provisioning, upgrades, patching, and deletion.

  • Identify gaps and develop internal tools, APIs, and command-line interfaces (CLIs) that enable customers and ML/AI teams to deploy and effectively monitor inference services.

  • Write resilient systems that gracefully handle failure across large-scale distributed environments.

  • Develop automated tests to ensure quality and stability, and validate the clusters to identify and address hardware issues before delivery.

  • Support and debug production issues through on-call rotation.

You

  • Have 6+ years of experience in software engineering, 3+ years leading large-scale complex projects, or tech lead.

  • At least two years of experience working on orchestration and deployment systems

  • Experience using Kubernetes and third-party operators (CRDs, CSI, CNI, etc.).

  • Strong programming skills in Go and Python; ability to collaborate effectively on shared codebases

  • Take pride in owning and delivering core components of products and platforms.

  • Experience with infrastructure-as-code tools (e.g. Terraform, Pulumi).

  • Solid knowledge of Linux systems, networking, containers, and cloud infrastructure.

Nice to Have

  • Deep Kubernetes and Linux expertise

  • Experience operating the control plane and low-level pieces of large-scale Kubernetes clusters

  • Experience with user-level restrictions and hardening (e.g. AppArmor)

  • Experience with HPC clusters, environments & tooling

  • Experience with machine learning/AI frameworks

  • Expertise with hybrid or multi-cloud Kubernetes environments.

  • Familiarity with GPU, Infiniband, or high-performance computing on K8s.

  • Past contributions to CNCF projects or Kubernetes SIGs a plus.

If you don’t meet all of these requirements but believe you may be a good fit, please still apply and provide a cover letter that helps us understand your experience and readiness for this role.

Salary Range Information

Based on market data and other factors, the annual salary range for this position is $255,000 - $405,0000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, ~350 employees (2024) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Wellness and Commuter stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Similar Jobs

AeroSpike - Senior Site Reliability Engineer

AeroSpike

Bengaluru, Karnataka, India (Hybrid)
4 Weeks ago
NVIDIA - Senior Physical Design Verification Layout Engineer

NVIDIA

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
5 Months ago
bytedance - Cloud Network Engineer

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago
Octopus - Technical Account Manager

Octopus

Switzerland (Remote)
3 Weeks ago
Tencent - Security Software Engineer I

Tencent

California, United States (On-Site)
5 Months ago
PwC - Senior Associate - Azure DevOps

PwC

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Deepgram - Senior Pre-Sales Solutions Engineer

Deepgram

California, United States (Remote)
2 Months ago
Riot Games - Senior Software Engineer, Services - Esports Platform & Experiences

Riot Games

Dublin, County Dublin, Ireland (On-Site)
9 Months ago
Square - Corporate SRE

Square

Lisbon, Lisbon, Portugal (Remote)
2 Weeks ago
Salesforce - Forward Deployed Engineer - Deployment Strategist

Salesforce

Munich, Bavaria, Germany (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Sprinkler - Account Executive - Consultant

Sprinkler

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Forcepoint - DevOps Engineer - III

Forcepoint

Mumbai, Maharashtra, India (On-Site)
2 Weeks ago
Zones - Account Executive

Zones

Islamabad, Islamabad Capital Territory, Pakistan (On-Site)
3 Weeks ago
binance - Binance Accelerator Program - Backend Engineer (Java)

binance

Taipei City, Taiwan (Remote)
3 Years ago
Rockstar Games - Senior Network Programmer

Rockstar Games

Dundee, Scotland, United Kingdom (On-Site)
2 Months ago
Nintendo - Intern - Public Relations

Nintendo

Redmond, Washington, United States (On-Site)
9 Months ago
Marvell - Validation Engineer (L2, L3, Python Automation, Networking)

Marvell

Bengaluru, Karnataka, India (On-Site)
1 Year ago
Motorola solutions - Telecom Field Service Engineer

Motorola solutions

Antofagasta, Antofagasta, Chile (On-Site)
1 Year ago
Ziff Davis - Marketing Events Director

Ziff Davis

United States (Remote)
2 Months ago
Zones - Management Accountant

Zones

Islamabad, Islamabad Capital Territory, Pakistan (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in San Francisco, California, United States

160over90 - Account Director - Sr Account Director

160over90

Los Angeles, California, United States (On-Site)
3 Months ago
Next Level Business Services - Technical Lead – Java

Next Level Business Services

Jersey City, New Jersey, United States (On-Site)
9 Months ago
Advanced Systems Group, LLC - Senior Broadcast Engineer

Advanced Systems Group, LLC

Los Angeles, California, United States (Remote)
1 Year ago
CME Group - Generative AI Engineer

CME Group

Chicago, Illinois, United States (Hybrid)
1 Year ago
Carbon Health - Clinic Manager

Carbon Health

Rancho Cucamonga, California, United States (On-Site)
2 Weeks ago
Vercel - Data Analyst, Finance

Vercel

San Francisco, California, United States (Hybrid)
1 Month ago
Apple - Software Engineer in Enterprise and Education Technologies

Apple

Cupertino, California, United States (On-Site)
1 Month ago
CityBlock - RN Care Manager

CityBlock

Winston-Salem, North Carolina, United States (Hybrid)
1 Month ago
Trek - Service Technician/Advisor

Trek

Nashua, New Hampshire, United States (On-Site)
6 Months ago
Apple - Analog Engineering Program Manager

Apple

Austin, Texas, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Devops Jobs

CyberArk - Staff Platform Engineer

CyberArk

Salt Lake City, Utah, United States (Hybrid)
3 Months ago
Fireworks AI - Software Engineer, AI Training Infrastructure

Fireworks AI

Redwood City, California, United States (On-Site)
1 Month ago
Epic Games - Automation Engineer

Epic Games

(On-Site)
4 Months ago
GoDaddy - Senior Cloud Engineer

GoDaddy

(Remote)
3 Months ago
Gigamon - Technical Marketing Engineer - Cloud

Gigamon

Santa Clara, California, United States (On-Site)
1 Month ago
Hawkeye Innovations - DevOps Tech Lead

Hawkeye Innovations

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
3 Months ago
Litera - Site Reliability Engineer

Litera

Ahmedabad, Gujarat, India (On-Site)
9 Months ago
Zenoti - Manager - DevOps

Zenoti

Hyderabad, Telangana, India (On-Site)
3 Months ago
Rackspace Technology - Machine Learning Operations (MLOps) Architect - GCP

Rackspace Technology

Canada (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

San Francisco, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

San Francisco, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Lambda

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug