Site Reliability Engineer

1 Month ago • 1 Years + • Devops

Job Summary

Job Description

HappyRobot is seeking a Site Reliability Engineer to enhance operational resilience and stability for its AI worker platform, which automates communication in the logistics industry. The role involves owning stability, observability, and debugging workflows to ensure smooth system operation. The engineer will be responsible for resolving complex failures in real-time, developing tools to improve clarity, and shifting operations from reactive to proactive. This is a high-impact position focused on reducing incident load, building internal tooling, and improving system uptime and developer focus. The ideal candidate thrives on solving difficult problems and strengthening systems and teams.
Must have:
  • 1+ years debugging production systems (logs, traces, incidents)
  • Strong problem-solving skills
  • Comfort with Python and Go for code reading and utility writing
  • Familiarity with observability and monitoring tools
  • Clear, calm communication under pressure
Good to have:
  • Experience with distributed systems at scale
  • Built or maintained internal tooling for reliability
  • Familiarity with CI/CD or infra-as-code
  • Experience improving system observability
Perks:
  • Opportunity to work at a high-growth AI startup
  • Backed by top investors (a16z and YC)
  • Fast Growth - on track for double-digit ARR
  • Top-Tier Compensation - competitive salary + equity
  • Ownership & Autonomy
  • Work With the Best - world-class team of engineers

Job Details

About HappyRobot

HappyRobot is a platform to build and deploy AI workers that automate communication. See a demo

Our AI workers connect to any system or data source to handle phone calls, email, messages…

We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working with freight brokers, 3PLs, freight forwarders, shippers, warehouses, & other supply chain enterprises and tech startups.

We raised a Series A round from a16z and YC and we’re growing very fast.

We're looking for rockstars with a relentless drive, unstoppable energy, and a true passion for building something great—ready to embrace the challenge, push limits, and thrive in a fast-paced, high-intensity environment.

About the Role

We're looking for a Site Reliability Engineer to take the lead on scaling our operational resilience as we grow. You’ll own the stability, observability, and debugging workflows that keep our systems running smoothly. You'll be the go-to person for untangling complex failures in real time, designing tools that turn chaos into clarity, and helping us shift from reactive to proactive operations.

This is a high-impact, high-trust role where you’ll shape how reliability is done - reducing incident load, building internal tooling, and directly improving developer focus and system uptime. If you love getting to the root of hard problems and making systems (and teams) stronger, this is your moment.

Must-Have

  • 1+ years of hands-on experience debugging production systems (logs, traces, incidents, etc.)

  • Strong problem-solving skills and ability to dive into unfamiliar backend codebases

  • Comfort with Python and Go for reading code and writing small tools/utilities

  • Familiarity with observability and monitoring tools (e.g., Datadog, Prometheus, Sentry)

  • Clear, calm communication under pressure — especially during live incidents

Nice-to-Have

  • Experience working with distributed systems or services at scale

  • Built or maintained internal tooling for on-call teams or reliability workflows

  • Familiarity with deployment pipelines, CI/CD, or infra-as-code

  • Experience improving system observability (e.g., custom metrics, traces, log pipelines)

Why join us?

  • Opportunity to work at a high-growth AI startup, backed by top investors.

  • Fast Growth - Backed by a16z and YC, on track for double-digit ARR.

  • Top-Tier Compensation - Competitive salary + equity in a high-growth startup.

  • Ownership & Autonomy - Take full ownership of projects and ship fast.

  • Work With the Best - Join a world-class team of engineers and builders.

Our Operating Principles


Extreme Ownership

We take full responsibility for our work, outcomes, and team success. No excuses, no blame-shifting — if something needs fixing, we own it and make it better. This means stepping up, even when it’s not “your job.” If a ball is dropped, we pick it up. If a customer is unhappy, we fix it. If a process is broken, we redesign it. We don’t wait for someone else to solve it — we lead with accountability and expect the same from those around us.

Craftsmanship

Putting care and intention into every task, striving for excellence, and taking deep ownership of the quality and outcome of your work. Craftsmanship means never settling for “just fine.” We sweat the details because details compound. Whether it’s a product feature, an internal doc, or a sales call — we treat it as a reflection of our standards. We aim to deliver jaw-dropping customer experiences by being curious, meticulous, and proud of what we build — even when nobody’s watching.

We are “majos”
Be friendly & have fun with your coworkers. Always be genuine & honest, but kind. “Majo” is our way of saying: be a good human. Be approachable, helpful, and warm. We’re building something ambitious, and it’s easier (and more fun) when we enjoy the ride together. We give feedback with kindness, challenge each other with respect, and celebrate wins together without ego.

Urgency with Focus
Create the highest impact in the shortest amount of time. Move fast, but in the right direction. We operate with speed because time is our most limited resource. But speed without focus is chaos. We prioritize ruthlessly, act decisively, and stay aligned. We aim for high leverage: the biggest results from the simplest, smartest actions. We’re running a high-speed marathon — not a sprint with no strategy.

Talent Density and Meritocracy
Hire only people who can raise the average; ‘exceptional performance is the passing grade.’ Ability trumps seniority. We believe the best teams are built on talent density — every hire should raise the bar. We reward contribution, not titles or tenure. We give ownership to those who earn it, and we all hold each other to a high standard. A-players want to work with other A-players — that’s how we win.

First-Principles Thinking
Strip a problem to physics-level facts, ignore industry dogma, rebuild the solution from scratch. We don’t copy-paste solutions. We go back to basics, ask why things are the way they are, and rebuild from the ground up if needed. This mindset pushes us to innovate, challenge stale assumptions, and move faster than incumbents. It’s how we build what others think is impossible.

The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller.

By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer.

In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data.

If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through security@happyrobot.ai subject to the GDPR.

For more information, visit https://www.happyrobot.ai/privacy-policy

By submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described.

Similar Jobs

Epic Games - Senior QA Engineer

Epic Games

Cary, North Carolina, United States (On-Site)
7 Months ago
codeninja  - Senior AI Engineer/Team Lead

codeninja

Lahore, Punjab, Pakistan (On-Site)
1 Month ago
Palo Alto Networks - Principal Consultant, Offensive Security

Palo Alto Networks

Fort Meade, Maryland, United States (On-Site)
2 Months ago
Luxoft - Siebel L2 Support Consultant

Luxoft

New Delhi, Delhi, India (Remote)
9 Months ago
WebMD - Marketing Solutions Associate

WebMD

Madison, Wisconsin, United States (On-Site)
6 Months ago
Epic Games - Desktop Platform Engineer, Fortnite Tech

Epic Games

Cary, North Carolina, United States (On-Site)
4 Months ago
Apple - DevOps Engineer

Apple

Austin, Texas, United States (On-Site)
2 Months ago
ClimateCamp - AI / Machine Learning Engineer - Azure AI

ClimateCamp

Belgium (Hybrid)
1 Month ago
Flexera - Senior Site Reliability Engineer

Flexera

Bengaluru, Karnataka, India (Hybrid)
11 Months ago
Canva - Senior Platform Engineer - Workload Integration

Canva

Surry Hills, New South Wales, Australia (Remote)
5 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Ion - IT Desktop Support Engineer (Level 2)

Ion

New York, New York, United States (On-Site)
10 Months ago
Rippling - Account Executive, Global Products, Mid Market

Rippling

New York, United States (On-Site)
2 Months ago
London stock Exchange - Specialist, Quality Control

London stock Exchange

Penang, Malaysia (Hybrid)
1 Year ago
CD PROJEKT RED - Senior Gameplay Animator

CD PROJEKT RED

Boston, Massachusetts, United States (Hybrid)
3 Months ago
Activision - Senior Network Programmer

Activision

Santa Monica, California, United States (On-Site)
1 Month ago
Interactive Brokers - Senior Platform Engineer

Interactive Brokers

Greenwich, Connecticut, United States (Hybrid)
1 Month ago
LeoVegas - CRM Communications Delivery Specialist

LeoVegas

Newcastle Upon Tyne, England, United Kingdom (Hybrid)
2 Months ago
Qualcomm - Thermal Engineer, Staff - Automotive

Qualcomm

Hsinchu City, Taiwan (On-Site)
2 Months ago
Perplexity - AI Infra Engineer

Perplexity

Palo Alto, California, United States (On-Site)
1 Month ago
Playtika - IT Infrastructure Engineer

Playtika

Ukraine (On-Site)
9 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Madrid, Community of Madrid, Spain

Univrse - Lead Developer (Unity)

Univrse

Catalonia, Spain (On-Site)
5 Months ago
Localsoft games - Brazilian Portuguese Language Game Testers

Localsoft games

Málaga, Andalusia, Spain (On-Site)
3 Weeks ago
Digital sun games - VFX Artist

Digital sun games

Valencia, Valencian Community, Spain (On-Site)
3 Months ago
Red Points - Sales Account Executive

Red Points

Barcelona, Catalonia, Spain (On-Site)
2 Months ago
Mozilla - Staff Software Engineer - Mobile Android

Mozilla

Spain (Remote)
2 Months ago
sitetracker - Localisation Project Manager

sitetracker

Spain (Remote)
3 Months ago
peter and sons games - Marketing Artist

peter and sons games

Barcelona, Catalonia, Spain (Hybrid)
3 Months ago
hogarth - Postproduction Manager

hogarth

Madrid, Community Of Madrid, Spain (Remote)
3 Weeks ago
Red Points - Youtube MCN Account Manager

Red Points

Barcelona, Catalonia, Spain (On-Site)
3 Months ago
Hawkeye Innovations - Football Tracking Systems Technician

Hawkeye Innovations

Spain (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Capgemini - DevOps Engineer

Capgemini

Pune, Maharashtra, India (On-Site)
2 Months ago
Canva - Staff Software Engineer - Web Platform (Frontend)

Canva

Auckland, Auckland, New Zealand (Remote)
2 Months ago
Raw group - Site Reliability Engineer

Raw group

Málaga, Andalusia, Spain (On-Site)
2 Months ago
London stock Exchange - Senior AI Platform Engineer

London stock Exchange

London, England, United Kingdom (On-Site)
3 Months ago
Ubisoft - Build Engineer

Ubisoft

Paris, Île-de-France, France (Hybrid)
1 Month ago
deel. - Senior Backend Engineer, Node.js + AWS

deel.

Moldova (Remote)
3 Weeks ago
caliogo - Site Reliability Engineer

caliogo

Hyderabad, Telangana, India (On-Site)
1 Month ago
Rippling - Senior Software Engineer (Backend) - HRIS Platform

Rippling

San Francisco, California, United States (On-Site)
3 Months ago
Apple - Sr Software Engineer - Infrastructure and operations

Apple

Cupertino, California, United States (On-Site)
3 Months ago
Epic Games - Senior Mobile Platform Engineer

Epic Games

(On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

About The Company

San Francisco, California, United States (Remote)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (Hybrid)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (Hybrid)

Madrid, Community Of Madrid, Spain (Hybrid)

San Francisco, California, United States (On-Site)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by HappyRobot

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug