DevOps Engineer, HPC Services

1 Month ago • 7 Years + • Devops

Job Summary

Job Description

Mistral AI is building one of Europe's largest AI infrastructure offerings, providing customers with a private and integrated stack from bare-metal servers to managed PaaS. As a DevOps Engineer in the HPC services team, you will be responsible for building, scaling, and automating the computing management stack. This includes engineering fault-tolerant and reliable infrastructure to support both internal processes and the customer platform. You will design, build, and maintain scalable, highly available, and fault-tolerant infrastructures, automate the lifecycle of compute nodes, and develop new workflows and tooling to improve system reliability, availability, and performance. Key activities include CI/CD, containerization, orchestration, monitoring, logging, alerting, troubleshooting production issues, and collaborating with R&D and security teams.
Must have:
  • 7+ years of experience in DevOps/SRE
  • Proficiency in scripting languages (Python, Go, Bash)
  • Experience with CI/CD, Docker, Kubernetes
  • Troubleshoot K8s cluster issues and system upgrades
  • Familiarity with infrastructure-as-code (Terraform/CloudFormation)
  • Knowledge of monitoring, logging, alerting tools
  • Experience working against reliability KPIs
  • Strong understanding of networking, security, sysadmin
  • Excellent problem-solving and communication skills
Good to have:
  • Experience with HPC workload managers (Slurm)
  • Experience with distributed storage systems (Lustre, Ceph)
  • Exposure to highly available distributed systems and SRE issues
Perks:
  • Competitive salary and equity
  • Health insurance
  • Transportation allowance
  • Sport allowance
  • Meal vouchers
  • Private pension plan
  • Generous parental leave policy
  • Visa sponsorship

Job Details

About Mistral 

At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.

We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.

We are a dynamic, collaborative team passionate about AI and its potential to transform society.
Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore. We are creative, low-ego and team-spirited.

Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on https://mistral.ai/careers.

Role Summary 

We are building one of Europe’s largest AI infrastructure offering that will provide our customers a private and integrated stack in every form factor they may need — from bare-metal servers to fully-managed PaaS. As a DevOps Engineer, you will join a fast growing team to help building, scaling and automating our computing management stack. You will be responsible for building fault-tolerant and reliable infrastructure to support both our internal processes and customer platform.

Location: France 🇫🇷 and UK 🇬🇧 as primary location, or remote under conditions (see below)
Reporting line: Software Architect, HPC

What you will do

As a DevOps Engineer in the HPC services team, your primary responsibility will be to engineer robust and dependable infrastructure that supports both our internal operations and customer-facing platforms.

Key Responsibilities:

• Design, build and maintain scalable, highly available and fault-tolerant infrastructures
• Build, scale and automate the full lifecycle of compute nodes, from bootstrapping to decommissioning
• Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, API-based features, web apps, dashboards, etc.)
• Drive continuous improvement in infrastructure automation, deployment, and orchestration (CI/CD, containerization, orchestration, monitoring, logging and alerting systems...)
• Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)
• Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences
• Collaborate closely with R&D to streamline build systems, scale testing workflows and make sure our inference and model training environments are always highly available and seamlessly replicable across several HPC clusters
• Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements

About you

• 7+ years of experience in a DevOps/SRE role
• Exposure to highly available distributed systems and site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...)
• Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices
• Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes..)
• Proven experience troubleshooting complex K8s cluster issues and performing system upgrades
• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation
• Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)
• Experience working against reliability KPIs (observability, alerting, SLAs)
• Strong understanding of networking, security, and system administration concepts
• Excellent problem-solving and communication skills
• Self-motivated and able to work well in a fast-paced startup environment

Now, it would be ideal if you also had experience with:
HPC workload managers (Slurm)
Distributed storage systems (Lustre, Ceph)
Location & Remote

This role is primarily based at our HQ in Paris, France. We will prioritize candidates who either reside in Paris or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team. Our remote work policy is designed to offer flexibility, enhance work-life balance, and boost productivity. The number of remote workdays is determined by each manager, taking into account individual autonomy and specific circumstances—such as increased flexibility during the summer months. Regardless of the arrangement, we expect all employees to maintain open lines of communication with their teams and be available during core working hours.

In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting (currently France, UK, Germany, Netherlands, Spain and Italy). In that case, we ask all new hires to visit our Paris office:
•  for the first month of their onboarding (accommodation and travelling covered)
•  then at least 3 days per month

What we offer

💰 Competitive salary and equity
🧑‍⚕️ Health insurance
🚴 Transportation allowance
🥎 Sport allowance
🥕 Meal vouchers
💰 Private pension plan
🍼 Parental : Generous parental leave policy
🌎 Visa sponsorship

Similar Jobs

PwC - Senior de Auditoria (Caracas)

PwC

Caracas, Capital District, Venezuela (On-Site)
10 Months ago
Welltech - IOS Engineer

Welltech

Barcelona, Catalonia, Spain (Remote)
1 Month ago
Hawkeye Innovations - Data Processing Assistant

Hawkeye Innovations

Atlanta, Georgia, United States (On-Site)
4 Months ago
Light Speed Studios - Senior Technical Artist

Light Speed Studios

Irvine, California, United States (On-Site)
8 Months ago
Sega (UK) - Audio Programmer

Sega (UK)

Sofia, Sofia City Province, Bulgaria (On-Site)
3 Months ago
Salesforce - Principal, AgentForce Solution Engineer - Consumer Business Service

Salesforce

San Francisco, California, United States (On-Site)
2 Weeks ago
Tennr - Associate Embedded Solutions Engineer

Tennr

New York, New York, United States (On-Site)
3 Months ago
Blue Yonder - Expert Support Engineer - Cloud

Blue Yonder

Bengaluru, Karnataka, India (On-Site)
10 Months ago
Flexra Software - DevOps Engineer

Flexra Software

Canada (On-Site)
3 Weeks ago
GameChanger - Senior Full Stack Software Engineer, Video Platform

GameChanger

New York, United States (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

PwC - Senior de Auditoria (Caracas)

PwC

Caracas, Capital District, Venezuela (On-Site)
10 Months ago
Autodesk - Software Developer

Autodesk

British Columbia, Canada (Remote)
3 Weeks ago
flip fit - Product Manager - Content Ecosystem

flip fit

New York, New York, United States (Hybrid)
4 Months ago
Luxoft - BI Developer (SSIS and SSAS)

Luxoft

Gurugram, Haryana, India (On-Site)
8 Months ago
Luxoft - Regular BSP Developer

Luxoft

Bengaluru, Karnataka, India (Hybrid)
8 Months ago
Apple - Senior Software Engineer, Emoji UI

Apple

Cupertino, California, United States (On-Site)
1 Month ago
Optiv - Sr Engineer - Proofpoint I On-site, Bangalore

Optiv

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
MiQ - Account Management + Ad Operations Executive

MiQ

Shanghai, China (On-Site)
3 Months ago
Side - Tech & VFX Artist - Talent Pool

Side

United States (Remote)
2 Weeks ago
Barracuda - Senior Software Engineer (C/C++ Developer)

Barracuda

Bengaluru, Karnataka, India (Hybrid)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Paris, Île-de-France, France

Valeo - Apprentice Power Academy Training & Competence Assistant

Valeo

Cergy, Île-de-France, France (On-Site)
3 Weeks ago
Tesla - Automotive Service Technician

Tesla

Paris, Île-de-France, France (On-Site)
6 Months ago
Reenbow - Front-End Vue.js / Vuetify Developer

Reenbow

Paris, Île-de-France, France (On-Site)
1 Month ago
Sorare - Social Media Video Intern / Apprentice

Sorare

Paris, Île-de-France, France (Hybrid)
4 Weeks ago
Condé Nast - Director Financial Controller

Condé Nast

Paris, Île-de-France, France (On-Site)
4 Weeks ago
Publicis Groupe - Night Nursing Assistant

Publicis Groupe

Soulac-sur-Mer, Nouvelle-Aquitaine, France (On-Site)
2 Weeks ago
Ubisoft - Directeur.trice Technique, Personnages (F/H/NB) [Project AAA]

Ubisoft

Annecy, Auvergne-Rhône-Alpes, France (On-Site)
9 Months ago
Assystems - Coordinateur Contrôle-Commande Nucléaire EPR2 - H/F

Assystems

Lyon, Auvergne-Rhône-Alpes, France (On-Site)
9 Months ago
Gravitee - Senior Software Engineer

Gravitee

France (Hybrid)
10 Months ago
Valeo - Press Relations Officer

Valeo

Paris, Île-de-France, France (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Thousand Eyes - Senior Software Engineer (Java), API Platform

Thousand Eyes

Lisbon, Lisbon, Portugal (Hybrid)
1 Month ago
Blackshark - Senior DevOps Engineer

Blackshark

Graz, Styria, Austria (On-Site)
3 Months ago
Nagarro - Associate Principal Engineer, Cloud

Nagarro

India (Remote)
9 Months ago
Britive - Senior Cloud Solutions Engineer

Britive

United States (Remote)
3 Weeks ago
Consilio LLC - Infrastructure Site Reliability Engineer

Consilio LLC

Bengaluru, Karnataka, India (On-Site)
10 Months ago
bytedance - Senior Software Development Engineer - Cloud Native Databases

bytedance

Seattle, Washington, United States (On-Site)
7 Months ago
luxsoft - ML Platform Engineer

luxsoft

Pune, Maharashtra, India (On-Site)
1 Month ago
TransUnion - Associate Engineer DevOps

TransUnion

(Remote)
2 Months ago
Blinkhealth - Senior Cloud Engineer

Blinkhealth

New York, United States (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

About The Company

Paris, Île-de-France, France (On-Site)

New York, United States (Hybrid)

Palo Alto, California, United States (On-Site)

Paris, Île-de-France, France (On-Site)

Paris, Île-de-France, France (Hybrid)

Palo Alto, California, United States (Hybrid)

Paris, Île-de-France, France (Hybrid)

Paris, Île-de-France, France (On-Site)

Paris, Île-de-France, France (Hybrid)

View All Jobs

Get notified when new jobs are added by Mistral AI

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug