Site Reliability Engineer

N-ix

4+ Years | Remote | Full Time | 1 day ago

Apply Now

Job Summary

N-iX is seeking an experienced Site Reliability Engineer to join a project entering a pivotal phase with a major go-live planned for mid-February, targeting 75,000 users. The role involves ensuring the stability, scalability, and operational excellence of a Kubernetes-based platform in a hybrid environment. Key responsibilities include performance optimization, scaling strategies, observability, and reliability engineering, addressing anticipated challenges with increased user activity.

Must Have

4+ years of experience as SRE / DevOps Engineer
Strong hands-on experience with Kubernetes in production
Experience working with hybrid infrastructure (on-prem + cloud)
Solid knowledge of PostgreSQL performance tuning and scaling
Experience with Qdrant or other vector databases
Experience with Helm, Kubernetes autoscaling, and resource optimization
Familiarity with observability stacks (Prometheus, Grafana, ELK/Loki)
Understanding of performance engineering and load testing
Experience with Linux systems and networking
Strong troubleshooting and incident-management skills

Good to Have

Experience with STACKIT or other sovereign clouds
Experience with PgBouncer
Knowledge of SRE practices (SLO/SLI)
Experience in regulated or public-sector environments
German language skills

Perks & Benefits

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

Job Description

Project:

N-iX is a global software solutions and engineering services company

We are looking for an experienced Site Reliability Engineer to ensure the stability, scalability, and operational excellence of a Kubernetes-based platform running in a hybrid environment.

The project is entering a pivotal phase, with a major go-live planned for mid-February and a target audience of 75,000 users. User onboarding is already underway, with over 5,000 users connected and 15,000–20,000 expected to be active by year-end. While the system is stable, we anticipate increased activity and new challenges in January, February, and after the go-live—making this an exciting opportunity to make a real impact. The role focuses on performance optimization, scaling strategies, observability, and reliability engineering.

Required Skills:

4+ years of experience as SRE / DevOps Engineer
Strong hands-on experience with Kubernetes in production
Experience working with hybrid infrastructure (on-prem + cloud)
Solid knowledge of PostgreSQL performance tuning and scaling
Experience with Qdrant or other vector databases
Experience with Helm, Kubernetes autoscaling, and resource optimization
Familiarity with observability stacks (Prometheus, Grafana, ELK/Loki)
Understanding of performance engineering and load testing
Experience with Linux systems and networking
Strong troubleshooting and incident-management skills

Nice to Have:

Experience with STACKIT or other sovereign clouds
Experience with PgBouncer
Knowledge of SRE practices (SLO/SLI)
Experience in regulated or public-sector environments
German language skills

Responsibilities:

Operate and optimize hybrid infrastructure (on-prem & STACKIT)
Manage and scale Kubernetes clusters
Optimize Helm charts, resource usage, and autoscaling
Conduct performance, load, and stress testing
Ensure reliability, availability, and monitoring of production systems
Tune and operate PostgreSQL
Operate and optimize vector databases (e.g. Qdrant)
Implement monitoring, logging, and alerting
Support incident response and capacity planning

We offer\*:

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

14 Skills Required For This Role

Team Management Problem Solving Talent Acquisition Game Texts Load Testing Postgresql Networking Linux Incident Response Prometheus Grafana Elk Helm Kubernetes