Site Reliability Engineer (S3NS, an alliance between Thales and Google Cloud)
Thales
Job Summary
S3NS, a partnership between Thales and Google Cloud, is seeking a Site Reliability Engineer to manage and operate its GCP universe. The role involves 24/7 production incident management, developing a deep understanding of GCP services, and participating in an intensive training program on Google's technical stack. The SRE will ensure the reliability of sovereign GCP services, addressing complex challenges of a cloud platform at scale, and leveraging expertise in incident resolution and system design.
Must Have
- Execute and manage a GCP universe.
- Manage 24/7 production incidents.
- Monitor GCP service availability, scalability, latency, and efficiency.
- Fix problems and ensure system reliability; perform on-call duties.
- Collaborate with GCP experts to mitigate and resolve incidents.
- Document knowledge, standardize resolution flows, and improve operational playbooks.
- Conduct post-incident reviews for continuous improvement.
- Minimum 3 years experience in SRE and operations automation.
- Significant experience in highly regulated markets.
- Excellent level of English.
Good to Have
- Previous experience with GCP.
- Passion for technological innovation, Cloud, and "as code" operations.
- Experience operating critical large-scale systems with high availability.
- Curiosity to explore Google's technical DNA and master its technologies.
- Interest in joining specialized teams (Compute, Storage, Data, Observability/Tooling).
Perks & Benefits
- 24 holiday days a year
- Benefit Online
- Flexible working hours
- Comprehensive compensation and benefit package including medical coverage & life insurance
- Hybrid Workplace
- GoFluent & Udemy Subscription
- Engineering, Technology & Management Academies
Job Description
Thales is a global technology leader trusted by governments, institutions, and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation, our solutions empower critical decisions rooted in human intelligence. Operating at the forefront of defence and security, aerospace and space, cybersecurity and digital identity, we’re driven by a mission to build a future we can all trust.
In Romania, we are advancing innovation through software engineering, research and development, delivering solutions in key markets in which Thales Group operates. Our engineers design, develop and integrate solutions that impact global industries – from fully operational systems and subsystems for naval warfare and maritime security operations, to air traffic management systems, satellite-based solutions, tactical indoor simulations, identity and biometric technologies and more.
About the Role:
S3NS was born from an industrial partnership between Thales, a global leader in cybersecurity, and Google Cloud, a global leader in cloud solutions. Our ambition is to offer the best of both worlds to all organizations concerned with protecting their sensitive data (public institutions, OIVs, OSEs, etc.).
Your Day-to-Day:
As an SRE Engineer, your mission will be at the heart of operating our GCP universe:
- You will be in charge of the execution and management of the entirety of a GCP universe.
- You will manage 24/7 production incidents and develop a deep understanding of the technical services that form the backbone of the GCP services used by millions of customers.
- You will benefit from an accelerated and intensive training program on GCP technologies to be fully trained on the Google technical stack (e.g., Borg, Colossus, Spanner, Andromeda, etc.). You will also be able to train with experts from Google.
- You will be part of an SRE team responsible for operating sovereign GCP services, some of which are directly customer-facing and others which constitute technical infrastructures with availability requirements of 99.99% or more.
- Within this SRE organization, you will have the opportunity to take on the complex challenges associated with the unique scale of a cloud platform, while leveraging your expertise in incident resolution, complexity analysis, and understanding large-scale system design.
Key Responsibilities:
- SLI/SLO Monitoring: Monitor the availability, scalability, latency, and efficiency of sovereign GCP services by handling production incidents.
- Incident Resolution: Fix problems and ensure system reliability; perform periodic on-call duties according to a follow-the-sun model.
- Team Collaboration: Collaborate with GCP service experts worldwide to help mitigate and resolve incidents.
- Automation & Knowledge: Document knowledge to ensure all S3NS SREs work with the same information, standardize resolution flows, and improve operational playbooks.
- Post-Incident Reviews: After an incident, gather teams to perform a post-mortem, understand the causes, draw lessons, and encourage continuous improvement.
In accordance with the SecNumCloud qualification requirements for our services, delivered by ANSSI, this position is subject to reinforced security requirements. The successful candidate will have to undergo a security investigation conducted by our services, in accordance with our personnel security policy.
Your Profile
What motivates you:
- Technological innovation, the Cloud, and the operation of services and infrastructure in "as code" mode.
- The operation and management of critical large-scale systems with high availability.
- The curiosity to explore the technical DNA of Google: discovering how a Cloud works at an hyperscaler and mastering the technologies developed over more than 20 years.
- The opportunity to join specialized teams in key areas (Compute, Storage, Data, Observability/Tooling).
Your profile and experience:
- Education: Graduate of an engineering school or holder of a Master's degree.
- SRE & Automation Experience: Minimum three (3) years of proven experience in Site Reliability Engineering and operations automation (Security, Compliance, Problem Resolution).
- Regulated Context: Significant experience in highly regulated markets (Banking, Insurance, Medical, etc.).
- International Exposure: Exposure to an international environment with an excellent level of English required.
- Bonus: Previous experience on GCP is an asset, but your thirst to learn our own Cloud is paramount!
A word from the team
"The S3NS SRE team is the engine of our trusted cloud's reliability. If you are passionate about running robust and high-performing systems, come contribute to our mission: your expertise will make the difference."
Your careeer at Thales
Future opportunities will allow you to discover other domains or sites. You will be able to evolve and grow your competences in different areas:
- Room and attention to personal development.
- Build your talents in another domain of Thales Group, discovering new products, new customers, new country or go to a more complex Solutiom
- Choose between a technical expertise or a leadership path.
- Build an international career within a leading Engineering Group.
- Work for different Thales domains & entities.
Your immediate benefits
- 24 holiday days a year
- Benefit Online
- A good work-life balance which includes flexible working hours
- A comprehensive compensation and benefit package including medical coverage & life insurance
- Hybrid Workplace
- GoFluent & Udemy Subscription
- Engineering, Technology & Management Academies
At Thales, we’re committed to fostering a workplace where respect, trust, collaboration, and passion drive everything we do. Here, you’ll feel empowered to bring your best self, thrive in a supportive culture, and love the work you do. Join us, and be part of a team reimagining technology to create solutions that truly make a difference – for a safer, greener, and more inclusive world.