Tech Lead/Manager, K8s

1 Year ago • 10 Years +

Job Summary

Job Description

As a Tech Lead of the SRE team overseeing Kubernetes, you will be challenged, expected to grow your technical knowledge, and lead a team of SREs responsible for Kubernetes infrastructure. You will ensure maximum availability, reliability, and scalability of our multi-datacenter hybrid Linux environments, including performance and resilience testing. You will also advance the technology stack with innovative ideas, participate in capacity management, and create strategies for long-term fixes to critical production incidents. Maintaining documentation, building tooling, and creating alerts to identify and address infrastructure reliability are also key responsibilities. The role involves proactively identifying system anomalies and conducting post-mortems to communicate impact and remediation strategies.
Must have:
  • 10 years of relevant experience
  • 3 years of management experience
  • Thorough understanding of Linux (CentOS/Rocky)
  • Advanced knowledge of K8s and its ecosystem
  • Experience deploying/managing K8s on bare metal
  • Experience with K8s operators and GitOps tools
Good to have:
  • Familiarity with Cassandra at scale
  • Understanding of Kafka architecture
  • Experience in AdTech or High-Frequency Trading
  • Experience with security best practices

Job Details

Description

Tech Lead/Manager, K8s @ PulsePoint  
About PulsePoint:  
PulsePoint is a fast-growing healthcare technology company (with adtech roots) using real-time data to transform healthcare. We help brands and agencies interpret the hard-to-read signals across the health journey and unify these digital determinants of health with real-world data to produce the most dimensional view of the customer. Our award-winning advertising platforms use machine learning and programmatic automation to seamlessly activate this data, making marketing, predictive analytics, and decision support easy and instantaneous.  
Tech Lead/Manager, K8s:  
As a Tech Lead of the SRE team overseeing Kubernetes, you will be challenged, expected to grow your technical knowledge, challenge your fellow team members, and they will challenge you back. Our team is not competitive, but we are goal-oriented and driven to succeed.  
What you'll be doing:  
  • Lead a team of SREs responsible for Kubernetes infrastructure to achieve their personal and shared goals, thrive in their roles, as well as spearheading the engineering efforts by your own example.  
  • Ensure maximum availability, reliability, and scalability of our multi-datacenter hybrid Linux environments (10 clusters, 1000+ nodes, 20k+ pods, 1mil+ qps).  
  • Performance and resilience testing. This may include reviewing configuration, software choices/versions, hardware specs, etc.  
  • Advance our technology stack with innovative ideas and new creative solutions.  
  • Participate in capacity management of core systems and services, application analysis, performance, and security tuning.  
  • Create strategies for long-term permanent fixes to critical production incidents.  
  • Maintain documentation, build tooling, and create alerts to both identify and address infrastructure reliability.  
  • Proactively identify system anomalies.  
  • Conducting post-mortems and communicating impact and remediation strategies with service owners and C-level staff.  
What you’ll need:  
  • East Coast U.S. hours 9am-6pm EST preferred, but we can be flexible as long as you can work until 12pm/1pm EST; you can work fully remotely  
  • Location: Remote in U.S. (preferred) or anywhere in the world 
  • Minimum 10 years of relevant experience  
  • Minimum 3 years of management experience  
  • Thorough understanding of Linux (we use CentOS/Rocky).  
  • Advanced knowledge of K8s and its ecosystem:  
    • Extensive hands-on experience deploying and managing Kubernetes clusters on bare metal servers in production environments.  
    • Immaculate knowledge of best practices for architecting cross-datacenter Kubernetes clusters running on-premise with automated management using kubeadm.  
    • Bare metal server configuration optimization know-how for Kubernetes workloads, including networking, storage and security considerations.  
    • Kubelet and CRI tuning in accordance with best practices, including but not limited to NUMA and GPU optimization.  
    • Deep knowledge of Kubernetes internals, including etcd, container network interfaces and container runtimes.   
    • Thorough understanding of PKI certificates for all components (ability to manually troubleshoot and solve client, server, and control-plane certificate issues within Kubernetes with zero downtime).  
    • Vast experience in the development of custom Kubernetes operators and autoscalers, as well as tailored ingress/egress controllers, custom resource definitions.  
    • Fluency in GitOps automation tools (Flux v1/v2), comprehensive knowledge of Helm and Kustomize controllers.  
    • Ability to manage BGP configuration, mastery in kube-router and GoBGP, as well as MetalLB.  
    • Understanding of the most intricate details in rook/ceph implementation for Kubernetes.  
  • Deep understanding of Puppet configuration management toolset (experience with Chef, CFEngine or Salt also works).  
  • Experience administering NoSQL databases (Redis, ES).  
  • Experience with scalable infrastructure monitoring solutions such as Icinga, Prometheus, ELK.  
  • Strong scripting and automation skills using languages like Python, Ruby, Java, or Go.  
  • Advanced understanding of networking concepts (TCP/IP stack, BGP, DNS, CDN, load balancing).  
Bonus, but not required:  
  • Close familiarity with Cassandra at scale.  
  • Understanding of Kafka architecture.  
  • Experience in AdTech or High-Frequency Trading is a plus.  
  • Experience with Security-related best practices is a plus.  
WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.   

Similar Jobs

Autodesk - Senior Principal Software Engineer, AEC Data

Autodesk

Boston, Massachusetts, United States (Remote)
1 Year ago
Springer Group - Lead UX Researcher (Mat Cover)

Springer Group

Berlin, Berlin, Germany (Hybrid)
2 Months ago
welevel  - Senior Technical Artist (Character)

welevel

Munich, Bavaria, Germany (On-Site)
5 Months ago
plarium - 2D AI Artist

plarium

Lviv, Lviv Oblast, Ukraine (Hybrid)
2 Weeks ago
Amanotes - Creative Specialist

Amanotes

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Morning Star - Senior Regional Counsel, Regulatory Strategy and Advisory

Morning Star

Sydney, New South Wales, Australia (Hybrid)
3 Weeks ago
Syniverse - Lead Data Engineer

Syniverse

Bengaluru, Karnataka, India (Hybrid)
3 Weeks ago
FICO - Scores Product Management - Director

FICO

United States (Remote)
2 Months ago
Blizzard Entertainment - Sydney

Blizzard Entertainment

Sydney, New South Wales, Australia (On-Site)
1 Month ago
Trellix - Senior Staff Security Researcher

Trellix

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Amanotes - Senior Game Artist (Game Magic Tiles 3 - Hybrid Casual)

Amanotes

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)
6 Months ago
vertigoo games - Game Developer

vertigoo games

Istanbul, İstanbul, Türkiye (On-Site)
3 Months ago
JDA - Staff Software Engineer - Generative AI

JDA

Dallas, Texas, United States (Hybrid)
1 Year ago
Sabre India - Senior Solutions Architect – Hospitality

Sabre India

United States (Remote)
1 Month ago
bytedance - Software Engineer, SRE - Platform Services

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United Kingdom

Cognite - Senior Account Executive Oil & Gas

Cognite

London, England, United Kingdom (Hybrid)
1 Year ago
HP - Software Systems Engineer for Sure Click

HP

Cambridge, England, United Kingdom (On-Site)
2 Weeks ago
Thales - Tools Support Engineer

Thales

Templecombe, England, United Kingdom (On-Site)
2 Months ago
WebFX - Copywriter (Digital Marketing & B2B)

WebFX

United Kingdom (Remote)
4 Months ago
Marsh McLennan - Senior Professional Indemnity / Financial Lines Underwriter

Marsh McLennan

Bristol, England, United Kingdom (Hybrid)
2 Months ago
smarsh - Enterprise Customer Success Manager

smarsh

London, England, United Kingdom (Hybrid)
3 Months ago
Red Rover Interactive - Senior Server programmer

Red Rover Interactive

Newcastle Upon Tyne, England, United Kingdom (Hybrid)
1 Year ago
PlayStation Global - Staff Linux Network Software Engineer

PlayStation Global

London, England, United Kingdom (On-Site)
4 Months ago
Ion - Product Manager - XTP Analytics/ Clarus Charm

Ion

London, England, United Kingdom (On-Site)
2 Months ago
Dexerto - Entertainment Editor

Dexerto

United Kingdom (Remote)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!