Site Reliability Engineer (SRE)

1 Day ago • 3 Years + • Devops

Job Summary

Job Description

Drivemode is seeking an experienced Site Reliability Engineer (SRE) to manage the reliability, performance, and daily operations of their Kotlin/Swift mobile applications and Kotlin backend services on AWS. You will collaborate with product and platform engineers to establish SLIs/SLOs, automate operations, lead incident response, and promote a "code-driven reliability" culture. This role involves a production-support model where SREs and feature teams share Level 2/3 support, with SREs providing the necessary tools, coaching, and leadership for developers to excel in on-call responsibilities. You will have green-field influence in defining SRE culture, tooling, and error-budget policies, with a clear career path to Staff SRE or Reliability Lead as the company scales.
Must have:
  • 3+ years in SRE, DevOps, or backend engineering
  • Proficient in Kotlin/Java, Rust, Go, or Python
  • Linux & networking fundamentals
  • Hands-on AWS experience
  • Production experience with Datadog or similar
  • Incident response expertise
  • Relational DB and Redis operations knowledge
  • Excellent communication skills
Good to have:
  • AWS, CKA certifications
  • Feature-flag systems experience
  • Chaos-engineering tools experience
  • Automotive or fintech industry experience
Perks:
  • Competitive salary
  • Flexible remote policy
  • Allowance for certifications, conferences, and home lab gear

Job Details

Our Mission:
Driving technology always feels old. Not by a little bit. We believe vehicles can be a thousand times smarter, safer, and more connected to the world around us, and our mission is to see it happen. In 2019, we joined forces with Honda as their first startup acquisition, and now we’re expanding our vision into building the future of electric vehicles (BEV) for millions of people around the world.

Why Drivemode: 
Join Drivemode for an exciting startup environment and a vibrant culture that combines impactful work, competitive compensation, and excellent benefits. By becoming a part of our team, you'll contribute to a crucial mission that revolutionizes the way people engage with vehicles, addressing both business needs and the world's environmental challenges. This presents an exceptional opportunity to be at the forefront of innovation and drive Honda's success in the EV market.

About the Role:
We’re seeking an experienced Site Reliability Engineer to own the reliability, performance, and day-to-day operations of our Kotlin/Swift mobile applications and Kotlin backend services on AWS. You will partner with product engineers and platform engineers to design SLIs/SLOs, automate operations, lead incident response, and drive a “code-driven reliability” culture across time zones.
You will be part of a production-support model where: Level 2 / Level 3 are shared by SREs and feature teams. SREs provide the tooling, coaching, and leadership that make developers excellent on call.

Why Join?
Green-field influence: define SRE culture, tooling, and error-budget policy from day one.
Career trajectory: opportunity to grow into Staff SRE / Reliability Lead as we scale to multiple regions and product lines.
Impact at scale: your work spans globally across multiple regions and product lines.
Engineering-driven org: close collaboration with product, platform, and security teams who value operational excellence.
Competitive salary, flexible remote policy, and an allowance for certifications, conferences, and home lab gear.

What You Will Do:
  • Service Reliability: Define and track SLIs/SLOs & error budgets for backend APIs and mobile release health. Hold teams accountable to reliability goals.
  • Incident Management: Lead the on-call rotations, coordinate incident response, run post-mortems, and eradicate root causes.
  • Observability & Tooling: Own Datadog dashboards, log pipelines, crash analytics (Firebase / Sentry), and feature-flag metrics (LaunchDarkly / ConfigCat).
  • Automation & Elimination of Toil: Write tools and self-healing runbooks in Kotlin, Rust, Go, or Python for rollbacks, DB failovers, chaos tests, and config drift detection.
  • Capacity & Performance: Forecast load, run stress / load tests, tune JVM & Graal settings for Kotlin services, and advise on RDS & Redis scaling.
  • Disaster Recovery & Chaos Engineering: Design BCP/DR playbooks; run game days to validate recovery objectives.
  • Cost & FinOps: Instrument cost metrics and collaborate with Finance to keep AWS spend within agreed “cost budgets.”
  • Security & Compliance Support: Monitor GuardDuty / CSPM alerts, be prepared and participate in security incident response.
  • Developer Partnership: Pair with mobile & backend engineers on instrumentation, release gates, and staged roll-outs; mentor teams in SLO thinking via brown-bag sessions.

What We Are Looking For:
  • 3 + years in SRE, DevOps, or backend engineering for high-traffic services
  • Proficient in at least one of Kotlin / Java, Rust, Go, or Python
  • Deep Linux & networking fundamentals and hands-on AWS (ECS, ALB/NLB, RDS, S3, IAM, CloudWatch)
  • Production experience with Datadog (or Prometheus / OpenTelemetry) for metrics, traces, and logs
  • Incident response expertise: runbooks, RCA, post-mortems, and blameless culture
  • Practical knowledge of relational DB (PostgreSQL/RDS) and Redis operations
  • Familiarity with Kubernetes (EKS) concepts, Helm/OPA, container networking, and rolling releases
  • Excellent communication skills; able to coach developers and influence process improvements.

Nice to have:
  • AWS, CKA certifications
  • Experience with feature-flag systems, chaos-engineering tools
  • Prior work in regulated or enterprise-integrated environments (e.g., automotive, fintech)

EEOC Statement: Drivemode is proud of a very diverse team with employees coming from 5 continents/20 countries as of today. Diversity in our workplace has played an important part in our success; we recognize each employee’s unique background, knowledge, experiences, ideas, and viewpoints which are all critical in developing a product that has the greatest impacts on drivers all over the world. Drivemode provides equal opportunities to all employees and applicants for employment without regard to race, religion, color, age, gender, national origin, sexual orientation, gender identity, disability, or any other characteristics that make you unique. 

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Tokyo, Japan

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Tokyo, Japan (Hybrid)

Tokyo, Japan (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Tokyo, Japan (On-Site)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Drive mode

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug