Senior Site Reliability Engineer

1 Month ago • 5 Years + • Devops

Job Summary

Job Description

Nexthink is the leader in digital employee experience management software, providing IT leaders with unprecedented insight to diagnose and fix issues impacting employees before they notice. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its 1,200+ customers to provide better digital experiences to over 15 million employees. At Nexthink, we empower customers with industry-leading solutions for continuous improvement of employee experience, delivering unmatched visibility across all environments. As a SaaS provider, our commitment is to deliver a seamless, resilient, and scalable platform around the clock. The mission of Nexthink's SRE team is to strengthen infrastructure and enhance deployment, monitoring, and system scaling effectively and reliably. They collaborate with Product Engineering, Technical Platform Engineering, Security, and Architecture teams to understand reliability requirements, design and implement solutions, and promote adoption.
Must have:
  • Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
  • Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support rapid delivery cycles.
  • Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
  • Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
  • Develop infrastructure-as-code (Terraform or similar) for repeatable and auditable provisioning.
  • Build internal platform tools and automation to support provisioning, monitoring, and operational efficiency.
  • Monitor infrastructure and applications ensuring high-quality user experiences.
  • Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.
  • Act as an Incident Commander during the on-call duty and coordinate cross-team responses effectively to maintain an SLA.
  • Drive and refine incident response processes, reducing Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).
  • Diagnose and resolve complex issues independently, minimizing the need for external escalation.
  • Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design.
  • Automate runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
  • Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.
  • Contribute to security best practices, compliance automation, and cost optimization.
Good to have:
  • FedRAMP experience
Perks:
  • Permanent Contract and a competitive compensation package (Stock Options also included).
  • Amazing centrally located offices near the Bernabeu Stadium.
  • Private Health Insurance (Sanitas) and daily meal vouchers of 11 EUR will be entirely covered by us.
  • Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
  • Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 23 days of holidays we offer) plus 3 company-paid volunteer days.
  • Up to 25 EUR per month for a gym subscription.
  • Flexible retribution plan for kindergarten & transport tickets.
  • Reimbursement of up to 50% of the cost of English & Spanish classes.
  • Fresh fruit, cookies, and occasionally some soft drinks as well.
  • Regular company and team events like Pizza talks, Team Building activities, Christmas parties, hosting Meetups at the office and more!
  • Bonuses for referring successful hires after three months of continuous employment.
  • We offer a relocation package to people who are coming from another country.

Job Details

Company Description

Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight allowing them to see, diagnose and fix issues at scale impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its more than 1,200 customers to provide better digital experiences to more than 15 million employees. Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide.

Job Description

At Nexthink, we empower our customers with industry-leading solutions to enable continuous improvement of employee experience. We deliver unmatched visibility across all environments, so IT teams can consistently see, diagnose, and fix digital workplace issues. As a SaaS provider, our commitment is to deliver a seamless, resilient, and scalable platform around the clock.

We are looking for an experienced, proactive and innovative professional that is keen to join as a Senior Site Reliability Engineer! The mission of Nexthink's SRE team is to strengthen our infrastructure and enhance our ability to deploy, monitor, and scale systems effectively and reliably. They work closely with over 50 Product Engineering teams that develop our products and services, as well as with the Technical Platform Engineering, Security and Architecture teams to understand the reliability requirements, design and implement solutions, and promote them for adoption and usage.

Join our vibrant team of diverse and experienced engineers where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Apply now and become a key player in our dynamic SRE organisation.

As a Senior Site Reliability Engineer, you will:

  • Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
  • Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support rapid delivery cycles.
  • Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
  • Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
  • Develop infrastructure-as-code (Terraform or similar) for repeatable and auditable provisioning.
  • Build internal platform tools and automation to support provisioning, monitoring, and operational efficiency.
  • Monitor infrastructure and applications ensuring high-quality user experiences.
  • Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.
  • Act as an Incident Commander during the on-call duty and coordinate cross-team responses effectively to maintain an SLA.
  • Drive and refine incident response processes, reducing Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).
  • Diagnose and resolve complex issues independently, minimizing the need for external escalation.
  • Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design.
  • Automate runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
  • Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.
  • Contribute to security best practices, compliance automation, and cost optimization.

Qualifications

  • Minimum Bachelor’s degree in Computer Science or equivalent practical experience.
  • 5+ years of experience as a Site Reliability Engineer or Platform Engineer with strong knowledge of software development best practices.
  • Strong hands-on experience with public cloud services (AWS, GCP, Azure) and supporting SaaS product.
  • Strong programming or scripting skills (e.g., Python, Go, Bash...), and experience with infrastructure-as-code (e.g. Terraform).
  • Proficiency with Kubernetes, container-based deployment (e.g., Docker) and related ecosystems (e.g., Helm).
  • Experience supporting multi-tenant microservices architectures.
  • Experience with CI/CD pipelines & tools (e.g., Jenkins, GitHub Actions, GitLab CI, FluxCD, Crossplane).
  • Experience with managing monitoring solutions (e.g. Datadog).
  • Comfortable participating in a rotating on-call schedule, managing critical incidents, and leading post-incident reviews.
  • At ease with operating and managing production systems, striking the right balance between urgency and methodology.
  • Strong system-level troubleshooting skills and a proactive mindset toward incident prevention.
  • Deep understanding of Linux systems, networking, and common troubleshooting practices.
  • Solid understanding of the network stack (e.g., TCP/IP, VPN, etc.), cloud architectures (VPC, subnets, firewalls, load balancers), service mesh (e.g., Istio) and storage (e.g., S3, EBS, etc).
  • Knowledge of zero-downtime deployment strategies, blue/green and canary releases.
  • Exposure to compliance standards such as SOC 2, ISO 27001, or HIPAA. FedRAMP experience is a big plus.
  • Experience with chaos engineering or resilience testing practices.
  • Excellent problem-solving skills, collaborative mindset, and a strong grasp of agile, iterative development.
  • Self-driven, highly organised, and capable of independently managing priorities.
  • Curiosity to learn new things and discover new technologies.
  • Strong communication, presentation, and team collaboration skills.
  • Excellent written and verbal skills in English.

The prior experience with any of the above-mentioned tools is a bonus, but not a must! We encourage you to apply even if you do not meet every single requirement. We welcome candidates with different level of background and experience. If you are excited about this role, please apply and our recruiters will assess your application.

Additional Information

We are the pioneers and trailblazers of a global IT Market Category (DEX) that is shaping the future of how the world works, giving our customers’ IT Teams total digital visibility across their enterprise. Our innovative solutions integrate real-time analytics, automation, and employee feedback across all endpoints. This enables our IT teams to solve complex technical challenges, create ever more productive workplaces, and deliver happy, satisfied employees in the digital workplace.

With over 1000 employees across 5 continents, Nexthink operates as One Team, connecting, collaborating and innovating to continuously grow. We call our employees ‘Nexthinkers’ and our commitment to diversity, inclusion, and equity is second to none. We currently have over 75 nationalities working with us, from all cultures and backgrounds, speaking many different languages.

If you are looking for a change and like a nice atmosphere, lots of challenges, and having fun while working, this is a great opportunity for you! Check what we offer:

  • 💼 Permanent Contract and a competitive compensation package (Stock Options also included).
  • 📍 Amazing centrally located offices near the Bernabeu Stadium.
  • 🩺 Private Health Insurance (Sanitas) and daily meal vouchers of 11 EUR will be entirely covered by us.
  • 🏡 Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
  • 🏖️ Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 23 days of holidays we offer) plus 3 company-paid volunteer days.
  • 🤸 Up to 25 EUR per month for a gym subscription.
  • 🛴 Flexible retribution plan for kindergarten & transport tickets.
  • 🧑‍🏫 Reimbursement of up to 50% of the cost of English & Spanish classes.
  • 🍉 Fresh fruit, cookies, and occasionally some soft drinks as well.
  • 🍕 Regular company and team events like Pizza talks, Team Building activities, Christmas parties, hosting Meetups at the office and more!
  • 📣 Bonuses for referring successful hires after three months of continuous employment.
  • 🚚 We offer a relocation package to people who are coming from another country.

Please note that not all the benefits listed above are available for temporary, contract, and internship roles. To ensure you have the most up-to-date information, we recommend checking with your Recruitment Partner.

Similar Jobs

Internet Brands - Senior Online Marketing Manager

Internet Brands

El Segundo, California, United States (On-Site)
2 Months ago
Backbone - Engineering Manager, Backend

Backbone

Seattle, Washington, United States (On-Site)
1 Year ago
PayPal - Senior Quality Assurance Engineer

PayPal

Chennai, Tamil Nadu, India (Hybrid)
2 Months ago
Axon - Deal Enablement Analyst

Axon

Denver, Colorado, United States (Hybrid)
1 Month ago
CityBlock - Senior Manager, Behavior Change & Design

CityBlock

United States (Hybrid)
1 Month ago
Decagon - Associate Solutions Engineer

Decagon

San Francisco, California, United States (On-Site)
2 Months ago
bytedance - Research Engineer Graduate (Vision AI Platform)

bytedance

Seattle, Washington, United States (On-Site)
4 Months ago
extreme network - Staff Software Engineer - DevSecOps - AWS/Azure - Terraform/Ansible - CI/CD Pipelines

extreme network

Bengaluru, Karnataka, India (Hybrid)
2 Months ago
Apple - Senior Site Reliability Engineer

Apple

Austin, Texas, United States (On-Site)
1 Month ago
CyberArk - Automation Framework Engineer

CyberArk

Bulgaria (Hybrid)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Scanline VFX - Project Manager

Scanline VFX

Los Angeles, California, United States (Hybrid)
5 Months ago
PwC - CRM Manager

PwC

Dublin, County Dublin, Ireland (On-Site)
2 Months ago
Zuru - Global FP&A Manager

Zuru

Auckland, Auckland, New Zealand (On-Site)
2 Months ago
Nice - Commercial Account Executive

Nice

California, United States (On-Site)
2 Months ago
Jam City - Corporate IT Security Engineer

Jam City

Buenos Aires, Buenos Aires, Argentina (Remote)
2 Months ago
TransUnion - Senior Advisor, Product Excellence

TransUnion

Chicago, Illinois, United States (Hybrid)
3 Months ago
Pulse Point - Senior Data Engineer, DP Team (U.S.)

Pulse Point

United States (On-Site)
1 Month ago
Visa - Director, Go-to-Market Strategy – Risk Solutions

Visa

Atlanta, Georgia, United States (Hybrid)
1 Month ago
The Globel Talent Co - Data Analyst

The Globel Talent Co

Johannesburg, Gauteng, South Africa (Remote)
4 Months ago
Dream Games - Creative Producer, Brand Marketing

Dream Games

London, England, United Kingdom (On-Site)
1 Year ago

Get notifed when new similar jobs are uploaded

Jobs in Madrid, Community of Madrid, Spain

Scopely - Design Director

Scopely

Barcelona, Catalonia, Spain (Hybrid)
3 Months ago
hogarth - TRAINEE PROJECT MANAGER

hogarth

Madrid, Community Of Madrid, Spain (Hybrid)
4 Weeks ago
Magic Media - Business Development Manager

Magic Media

Barcelona, Catalonia, Spain (Remote)
1 Month ago
Triple dot studios - Principal UI Animator

Triple dot studios

Barcelona, Catalonia, Spain (Hybrid)
1 Month ago
Universally Speaking - Recruiter

Universally Speaking

Madrid, Community Of Madrid, Spain (On-Site)
1 Month ago
PwC - Consultor JR Workday – Tenerife

PwC

Santa Cruz De Tenerife, Canary Islands, Spain (On-Site)
10 Months ago
Adyen - Senior Product Manager - Banking Network

Adyen

Madrid, Community Of Madrid, Spain (On-Site)
1 Month ago
Valeo - Bussines Process Analyst. Senior profile (Logistic)

Valeo

Martos, Andalusia, Spain (On-Site)
1 Month ago
Autodesk - Technical Solutions Executive - Inside Sales D&M

Autodesk

Barcelona, Catalonia, Spain (Hybrid)
2 Months ago
Buckman - BDM / Solutions Engineer - Tissue Iberia

Buckman

Catalonia, Spain (On-Site)
10 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Thousand Eyes - Lead Software Engineer, Account Management Platform

Thousand Eyes

San Jose, California, United States (On-Site)
1 Month ago
easygo - Staff DevOps Engineer - Core Infrastructure

easygo

Melbourne, Victoria, Australia (On-Site)
1 Month ago
Salesforce - Senior, Specialist Solution Engineer

Salesforce

London, England, United Kingdom (Hybrid)
1 Month ago
Kgroup - DevOps (Blockchain Gaming)

Kgroup

Thành Phố Hồ Chí Minh, Vietnam (On-Site)
1 Year ago
bytedance - AI and Cloud Solution Architect

bytedance

Singapore (On-Site)
4 Months ago
Figma - Software Engineer, Mobile Platform

Figma

San Francisco, California, United States (Remote)
1 Month ago
Postman - Staff Engineer, Identity Platform

Postman

San Francisco, California, United States (Hybrid)
3 Months ago
Workato - Senior Infrastructure Engineer

Workato

Lisbon, Lisbon, Portugal (On-Site)
1 Month ago
Spaulding Ridge - Anaplan Solution Architect

Spaulding Ridge

Chicago, Illinois, United States (On-Site)
3 Months ago
Next Level Business Services - Solution Architect

Next Level Business Services

Mount Laurel Township, New Jersey, United States (On-Site)
10 Months ago

Get notifed when new similar jobs are uploaded

About The Company

DeSoto, Texas, United States (On-Site)

Tomball, Texas, United States (On-Site)

Wahpeton, North Dakota, United States (On-Site)

Seattle, Washington, United States (On-Site)

Albuquerque, New Mexico, United States (On-Site)

Cochran, Georgia, United States (On-Site)

Holmes Beach, Florida, United States (On-Site)

Cincinnati, Ohio, United States (On-Site)

Nashville, Tennessee, United States (On-Site)

Dickinson, North Dakota, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Square

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug