Staff Software Engineer

5 Months ago • 7-10 Years • DevOps

Job Summary

Job Description

Staff Software Engineer at Toast, Bengaluru, India. 7+ years of experience in building and running production systems. Expertise in Java, Python, cloud platforms (AWS, GCP, Azure), microservices, observability, and incident management. Experience with distributed tracing, log aggregation, and performance testing.
Must have:
  • Java & Python
  • Cloud Platforms
  • Microservices
  • Observability
Good to have:
  • Database Tech
  • Android/IoT
  • Terraform
  • Containers
Perks:
  • Inclusive Culture
  • Growth Opportunities

Job Details

About the job

Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants, by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.

Are you bready* for a change?

At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer- facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability as well as through predictions and capacity planning.

About this roll* (Responsibilities)

  • Define, implement and evolve a world-class observability technology stack that allows rapid detection of issues in our system and enables root cause analysis (25%)
    • Provide scalable metrics and dashboarding solutions for R&D
    • Provide distributed tracing capabilities to visualize and track issues across our complex system
    • Provide log aggregation and insights for R&D using best in class technology
    • Provide a global view of the true customer experience through usage of Real-User Monitoring & external cloud-based solutions
  • Act as a champion for reliability and work with partner teams in different lines of business to influence product roadmaps to improve resiliency and reliability of all services. Champion our uptime targets and enable other teams to improve the way we measure the reliability of the system (25%)
  • Provide technical leadership in production triage, incident resolution, and retrospective/root cause analysis to maintain the reliability and uptime of our platform (20%)
    • Leverage a strong understanding of Cloud Architecture
    • Knowledge of Java and the JVM (Java Virtual Machine) to triage and understand issues within services
    • Implement strategies to increase system reliability and performance through on-call rotation and process optimization
    • Lead incident post-mortem/retrospectives to surface reliability improvements and drive to completion
  • Mentor and coach peers and reliability champions on SRE best practices. Contribute to running an SRE Guild (15%)
  • Design, build and drive adoption of a platform that enables service resilience testing/chaos engineering to validate and test Toast’s architecture is resilient to failure. Build and own a performance testing framework/environment to enable our R&D teams to understand the constraints of their services and improve performance (15%)

Do you have the right ingredients*? (Requirements)

  • Extensive and broad industry experience with at least 7+ years in building and running production systems and participating in incident calls
  • Comfortable reading, writing, and debugging object oriented languages - Java and Python etc.
  • Well-versed in software architecture and deep understanding of cloud and microservices
  • Demonstrated experience working with at least one major cloud platform (AWS, GCP, or Azure)
  • Exposure to complex, mission critical, and large scale distributed systems
  • Ability to set an example for the team with positive and inclusive leadership and discussion on work.
  • General knowledge of most technical expertise areas, with deep knowledge in two areas.
    • Observability platforms (Datadog, NewRelic, Splunk, AppD, etc.) - APM, RUM, Synthetic monitoring
    • Prometheus, Thanos, and Grafana: service catalog metrics and recording rules for alerts
    • Log shipping pipelines and incident debugging visualizations
    • Block and object storage configuration and debugging
    • Database experience (DynamoDB, Aurora, RDS)
    • Advanced Terraform syntax and GitLab CI/CD configuration, pipelines, jobs
    • Containers: cluster provisioning and new services
Bonus ingredients*

  • Deep experience with Database technologies
  • Experience working with android and/or IoT devices
  • Bread puns encouraged but not required

We are Toasters

Diversity, Equity, and Inclusion is Baked into our Recipe for Success.

At Toast our employees are our secret ingredient. When they are powered to succeed, Toast succeeds.

The restaurant industry is one of the most diverse industries. We embrace and are excited by this diversity, believing that only through authenticity, inclusivity, high standards of respect and trust, and leading with humility will we be able to achieve our goals.

Baking inclusive principles into our company and diversity into our design provides equitable opportunities for all and enhances our ability to be first in class in all aspects of our industry.

Bready* to make a change? Apply today!

Toast is committed to creating an accessible and inclusive hiring process. As part of this commitment, we strive to provide reasonable accommodations for persons with disabilities to enable them to access the hiring process. If you need an accommodation to access the job application or interview process, please contact candidateaccommodations@toasttab.com.

Similar Jobs

Netflix - Full Stack Software Engineer L5 - Content Infrastructure & Solutions

Netflix

Warsaw, Masovian Voivodeship, Poland (On-Site)
3 Months ago
Luxoft - Senior Java Backend Developer - Microservices

Luxoft

Pune, Maharashtra, India (On-Site)
3 Months ago
Luxoft - Senior Data Engineer

Luxoft

(Remote)
3 Months ago
Salesforce - Principal Software Engineer / PMTS- Backend - Hyderabad

Salesforce

Hyderabad, Telangana, India (On-Site)
5 Months ago
Google - Software Engineering Manager II, Google Cloud Data Management

Google

Sunnyvale, California, United States (On-Site)
1 Month ago
Dynamics - Infrastructure Architect (SEVIS)

Dynamics

(Remote)
2 Months ago
Fortis Games - Senior DevOps Engineer

Fortis Games

Brazil (On-Site)
1 Month ago
Ubisoft - Cloud Developer

Ubisoft

Montreal, Quebec, Canada (Hybrid)
4 Months ago
Luxoft - Senior Java Developer

Luxoft

Ukrainka, Kyiv Oblast, Ukraine (Remote)
2 Months ago
Luxoft - Senior Python Developer with Networking

Luxoft

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Next Level Business Services - Java Tech Lead

Next Level Business Services

Chicago, Illinois, United States (On-Site)
4 Months ago
ARHS - Java Developer

ARHS

The Hague, South Holland, Netherlands (Remote)
4 Months ago
Google - Software Engineer, Early Career, Cloud AI

Google

(On-Site)
3 Months ago
Rackspace Technology - AWS Cloud Engineer II

Rackspace Technology

Aguascalientes, Aguascalientes, Mexico (Remote)
4 Months ago
BlackRock - Linux System Engineer -Vice President

BlackRock

Gurugram, Haryana, India (Hybrid)
5 Months ago
Simplify 360 - Tech Lead Full Stack (Java + React)

Simplify 360

Chennai, Tamil Nadu, India (Hybrid)
4 Months ago
Circana - Senior UI Developer

Circana

Pune, Maharashtra, India (Hybrid)
5 Months ago
ByteDance - Site Reliability Engineer, Traffic Platform

ByteDance

Seattle, Washington, United States (On-Site)
4 Months ago
Patreon - Fullstack Software Engineer, Payments

Patreon

San Francisco, California, United States (Hybrid)
3 Months ago
PwC - IN-Senior Associate –D365 CRM Technical_MS Dynamics_Advisory_Mumbai

PwC

Mumbai, Maharashtra, India (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Taboola - Solution Engineer

Taboola

New Delhi, Delhi, India (On-Site)
5 Months ago
Nielsen Holdings - Senior Software Engineer - Bigdata (Java/Scala , Spark, Python, AWS )

Nielsen Holdings

Gurugram, Haryana, India (Hybrid)
4 Months ago
Zeta - Lead Software Development Engineer - Backend.

Zeta

Bengaluru, Karnataka, India (On-Site)
4 Months ago
BBY India - Software Engineer II [T500-12552]

BBY India

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Scopely - Technical Art - Intern

Scopely

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Juego Studios - Senior Finance Executive

Juego Studios

Bengaluru, Karnataka, India (On-Site)
2 Months ago
DNEG - Pipeline Assistant Technical Director (ATD)

DNEG

India (On-Site)
4 Months ago
PwC - Manager_ Cloud Architecture _ Advisory corporate _ Advisory _Hyderabad

PwC

Hyderabad, Telangana, India (On-Site)
3 Months ago
Head Digital Works - Data Scientist

Head Digital Works

Hyderabad, Telangana, India (On-Site)
7 Months ago
ION - Senior Software Engineer

ION

Pune, Maharashtra, India (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Ubisoft - Senior Software Engineer - RUST Backend (W/M/NB)

Ubisoft

Saint-Mandé, Île-de-France, France (Hybrid)
2 Months ago
ION - Database Engineer (352), New York (hybrid)

ION

New York, New York, United States (Hybrid)
4 Months ago
Playtech - Dev Ops Engineer

Playtech

London, England, United Kingdom (On-Site)
2 Months ago
Lockwood - Cloud Engineer

Lockwood

United Kingdom (Remote)
4 Weeks ago
Microsoft - Principal Researcher

Microsoft

Vancouver, British Columbia, Canada (On-Site)
1 Month ago
Hyqoo - Senior DevOps Engineer

Hyqoo

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Keywords Studios (Player Support) - Solutions Architect

Keywords Studios (Player Support)

United States (Remote)
3 Months ago
ByteDance - Global SRE Lead, Security Engineering

ByteDance

Singapore (On-Site)
3 Months ago
Nielsen Holdings - Software Engineer (Java/Scala, Spark, SQL, AWS, Kubernetes)

Nielsen Holdings

Gurugram, Haryana, India (Hybrid)
4 Months ago
Easygo - Senior DevOps Engineer

Easygo

Belgrade, Serbia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded