Senior Site Reliability Engineer

1 Month ago • 5 Years + • Devops

Job Summary

Job Description

Join Lytx's dynamic team as a Senior Site Reliability Engineer, focusing on designing and supporting cutting-edge IoT infrastructure as the company transitions to the cloud. This role involves 'Operations as Code', 'Infrastructure as Code', and automation. The SRE team ensures the availability, reliability, observability, and resilience of Lytx's services both on-premises and in the cloud. The ideal candidate will excel at crafting transformative solutions and designing robust cloud infrastructure with a focus on continuous improvement.
Must have:
  • 5+ years SRE experience in AWS
  • 5+ years experience with observability tools (Prometheus, Grafana)
  • Advanced programming skills in Python, Groovy, Bash
  • Strong understanding of SQL and NoSQL databases
  • 3+ years experience with infrastructure deployment pipelines (Git, Terraform)
  • Expertise in AWS production environments (VPC, EKS, IAM, EC2)
  • Hands-on Linux systems experience
  • Extensive Kubernetes experience
  • Experience managing 24/7 on-call rotations and runbooks
  • Ability to thrive under pressure

Job Details

Why Lytx:
Join our dynamic and passionate team of driven, low-ego engineers who are at the forefront of
designing and supporting cutting-edge IoT infrastructure. As we rapidly grow and transition to
the cloud, we're diving into the exciting realms of "Operations as Code," "Infrastructure as
Code," and innovative infrastructure automation.
Our Site Reliability Engineering (SRE) team is pivotal in ensuring the availability, reliability,
observability, and resilience of Lytx' services, both on-premises and in the cloud. We're not just
keeping the lights on—we're engineering the future of our business's continuity.
If you're energized by crafting transformative solutions and excel at designing robust, detailed
cloud infrastructure with a focus on continuous improvement, this could be the perfect role for
you!
Responsibilities:
System Design and Architecture: Design, implement, and maintain scalable and reliable
systems, ensuring they can handle both current and future demands.
Incident Management: Lead incident response efforts, diagnose root causes, and
implement long-term solutions to prevent recurrence. Ensure effective communication
during outages.
Monitoring and Observability: Develop and maintain comprehensive monitoring and
alerting systems to proactively identify and address issues before they impact users.
Automation and Efficiency: Automate repetitive tasks and processes to improve
operational efficiency and reduce manual intervention.
Performance Tuning: Continuously optimize system performance, including fine-tuning
applications, databases, and infrastructure to meet service level objectives (SLOs).
Capacity Planning: Forecast future system requirements based on growth trends and
current usage, and plan capacity upgrades to ensure system reliability.
Collaboration and Mentoring: Work closely with development teams to integrate
reliability into the software development lifecycle. Mentor junior SREs and share best
practices.

Documentation and Knowledge Sharing: Create and maintain detailed documentation on
system design, incident response procedures, and operational practices to ensure
knowledge is preserved and accessible.
Requirements:
5+ years of experience as an SRE within AWS environments at medium to large-scale
organizations.
5+ years of hands-on experience implementing and managing observability tools, such
as Prometheus, New Relic, Grafana, or similar.
Advanced programming skills in Python, Groovy, and Bash.
Strong understanding of database technologies, including both SQL and NoSQL
systems.
3+ years of experience developing and managing infrastructure deployment pipelines
using Git, Terraform, Helm, Jenkins/Jenkins X/ArgoCD, or similar tools.
Proven expertise in designing, evaluating, and supporting production environments in
AWS, including VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail, Control Tower,
GuardDuty, MSK, S3, Glacier, Gateways, Direct Connect, Route 53, RDS, ALBs,
Autoscaling, and more.
Hands-on experience with Linux systems and protocols and technologies such as HTTP,
REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message
Brokers, Nginx, Vault, etc.
Extensive experience with Kubernetes and various container and cloud-native
technologies.
Significant experience in managing 24/7 on-call rotations, creating runbooks,
establishing support procedures, and proactively monitoring systems across multiple
geographic locations.
Ability to thrive under pressure and excel in a technically challenging environment.

Innovation Lives Here


You go all in no matter what you do, and so do we. At Lytx, we’re powered by cutting-edge technology and Happy People. You want your work to make a positive impact in the world, and that’s what we do. Join our diverse team of hungry, humble and capable people united to make a difference.

Together, we help save lives on our roadways.

Find out how good it feels to be a part of an inclusive, collaborative team. We’re committed to delivering an environment where everyone feels valued, included and supported to do their best work and share their voices.

Lytx, Inc. is proud to be an equal opportunity/affirmative action employer and maintains a drug-free workplace. We’re committed to attracting, retaining and maximizing the performance of a diverse and inclusive workforce. EOE/M/F/Disabled/Vet.

Similar Jobs

Ion - Tech Lead (Java)

Ion

New York, United States (On-Site)
6 Months ago
Image Engine - Animator - Lead

Image Engine

Vancouver, British Columbia, Canada (Hybrid)
4 Months ago
Clearwater Analytics - Sr. Solutions Consultant

Clearwater Analytics

New York, United States (On-Site)
2 Months ago
CookUnity - Business Development Associate

CookUnity

Atlanta, Georgia, United States (Remote)
1 Week ago
Saxo Bank - Institutional Account Manager

Saxo Bank

Amsterdam, North Holland, Netherlands (On-Site)
3 Weeks ago
Ansys - R&D Engineer - DevSecOps

Ansys

Exton, Pennsylvania, United States (On-Site)
2 Weeks ago
attentive - Staff Site Reliability Engineer

attentive

United States (Remote)
1 Month ago
Cavnue - Senior Platform Infrastructure Engineer

Cavnue

United States (Remote)
2 Months ago
PayPal - Staff Infrastructure Software Engineer

PayPal

San Jose, California, United States (Hybrid)
1 Month ago
Sourcegraph  Inc  - Senior Solutions Engineer

Sourcegraph Inc

(Remote)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

NVIDIA - Senior Test Engineer

NVIDIA

(Remote)
5 Months ago
PayPal - Sr. Staff Engineer

PayPal

San Jose, California, United States (Hybrid)
2 Months ago
Barracuda - Distribution Sales Manager

Barracuda

Tokyo, Japan (Remote)
1 Week ago
Ansys - Senior R&D Engineer, Desktop AI/ML

Ansys

Vancouver, British Columbia, Canada (On-Site)
1 Month ago
Redhorse Corp - Systems Engineer

Redhorse Corp

Chantilly, Virginia, United States (On-Site)
6 Months ago
Cadence - DSP or Serdes RTL Senior Principal Digital Design Engineer

Cadence

San Jose, California, United States (On-Site)
3 Weeks ago
CAE - Lead Software/Hardware Engineer

CAE

Arlington, Texas, United States (On-Site)
1 Year ago
LLNL - Operations Research Engineer

LLNL

Livermore, California, United States (On-Site)
3 Weeks ago
Evolution  - In Studio Game Presenter

Evolution

Atlantic City, New Jersey, United States (On-Site)
6 Months ago
Tesla - Senior Mechanical Design Engineer (Vehicle Integration)

Tesla

Berlin, Berlin, Germany (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

CyberArk - Team Lead, Engineering

CyberArk

India (On-Site)
2 Weeks ago
Nagarro - Senior Staff Engineer, .Net Fullstack

Nagarro

Gurugram, Haryana, India (On-Site)
9 Months ago
Capgemini - Generative AI Developer

Capgemini

Hyderabad, Telangana, India (On-Site)
2 Months ago
AccelData - Associate Product Support Engineer (Hadoop Support)

AccelData

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Morning Star - Senior Executive- Talent Acquisition

Morning Star

Mumbai, Maharashtra, India (Hybrid)
1 Year ago
SSC Technologies - Investor Services Professional

SSC Technologies

Navi Mumbai, Maharashtra, India (On-Site)
3 Weeks ago
Zscaler - Principal Software Development Engineer (Java/Security Controls/Vault)

Zscaler

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Guardian - Product Owner

Guardian

Gurugram, Haryana, India (On-Site)
2 Weeks ago
Hitachi - Microsoft Dynamics AX Technical Consultant (D365 F&O)

Hitachi

India (Remote)
9 Months ago
JDA - Sr. Data Scientist I (ML, Python, Tensorflow)

JDA

Bengaluru, Karnataka, India (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Devops Jobs

nextgen-clearing - DevOps Engineer

nextgen-clearing

Mumbai, Maharashtra, India (On-Site)
1 Month ago
NVIDIA - Solutions Architect for NCP

NVIDIA

Dubai, Dubai, United Arab Emirates (On-Site)
3 Months ago
Addepar - Senior Backend Software Engineer - Partner Platform

Addepar

Edinburgh, Scotland, United Kingdom (On-Site)
3 Weeks ago
Granicus - Senior Software Engineer (SE4) - Ruby with AWS

Granicus

Bengaluru, Karnataka, India (Remote)
2 Months ago
bytedance - Solution Architect (GenAI), BytePlus

bytedance

Singapore (On-Site)
7 Months ago
Scanline VFX - Release DevOps Engineer

Scanline VFX

Vancouver, British Columbia, Canada (Hybrid)
6 Months ago
Sword Health - Site Reliability Engineer (SRE)

Sword Health

Portugal (Hybrid)
4 Months ago
Wargaming - Senior Build Engineer (Unannounced project)

Wargaming

Belgrade, Serbia (Hybrid)
1 Week ago
Netflix - Senior Software Engineer - Developer Automation Platform (Backend)

Netflix

Los Gatos, California, United States (On-Site)
3 Months ago
Canva - Senior Frontend Engineer - Apps API Platform

Canva

Auckland, Auckland, New Zealand (Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Lytx is the global leader in fleet management technologies. Our solutions harness the power of video to empower drivers and fleets to be safer and more efficient, productive, and profitable so they can thrive in today’s competitive environment. Through the Lytx platform, direct and reseller clients access our customizable services and programs spanning driver safety, risk detection, fleet tracking, compliance, preventative maintenance, and fuel management. Using the world’s largest driving database of its kind, along with proprietary machine vision and artificial intelligence technology, we help protect and connect thousands of fleets and 1.6 million drivers in more than 60 countries worldwide. Lytx is privately held and headquartered in San Diego, California. For more information, visit us at Lytx.com.


The SurfsightTM solution is Lytx's indirect market offering, available in North America and internationally. Strategic partners and resellers can use Surfsight's open API platform to easily add video to their telematics stack or utilize our stand-alone Surfsight Cloud dashboard to allow fleet managers to track vehicles, view risky and distracted driving events, retrieve videos from the field, and view and analyze data. The innovative technology in the Surfsight dash cam, powered by Lytx, uses robust machine vision and artificial intelligence to proactively detect and mitigate risk. It provides detailed analytics and real-time visibility into overall fleet performance, giving companies valuable data to help increase safety and savings through better fleet management. The solution offers an accessible entry point into video telematics without compromising on features, functionality, and configuration options. For more information visit https://www.lytx.com/en-us/surfsight.



Bengaluru, Karnataka, India (On-Site)

Haifa, Haifa District, Israel (On-Site)

San Diego, California, United States (On-Site)

Haifa, Haifa District, Israel (On-Site)

United States (Remote)

San Diego, California, United States (On-Site)

Bengaluru, Karnataka, India (On-Site)

San Diego, California, United States (On-Site)

United States (Remote)

View All Jobs

Get notified when new jobs are added by Lytx, Inc.

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug