Senior Site Reliability Engineer
Motorola solutions
Job Summary
As a Senior Site Reliability Engineer at Motorola Solutions, you will join the Emergency Call Management team, dedicated to improving reliability across public safety products. This role involves architecture and implementation of monitoring and observability, reinforcing HA and reliability strategies, triaging customer incidents, maintaining SLOs, facilitating Chaos Engineering, and developing SRE culture. You will also provide on-call support and assist customer support teams with communication. Motorola Solutions is a global community focused on critical communications, video security, and command center technologies to keep people safer everywhere.
Must Have
- Architecture and implementation of Monitoring/Observability objectives
- Creation and reinforcement of HA and reliability strategy
- Triage of customer-reported incidents and problems
- Maintenance and reporting of SLOs and error budget
- Facilitation of Chaos Engineering activities
- Developing SRE culture and sharing best practices
- On-call support (Incident Command)
- Assist customer support teams in creating customer facing communication documents
- Facilitation of Failure Mode and Effects Analysis
- 4+ years of professional software development experience
- Experience with Incident Response in Enterprise Environments
- Experience developing cloud-based applications (Azure/AWS)
- Experience developing REST-based APIs
- Implementing microservice principles and architectures
- Experience with modern DevOps tooling (including CI/CD pipelines)
Good to Have
- Familiarity with the concepts involved in designing a high-availability architecture
- Familiarity with automated testing
- Creativity and persistence when solving complex problems
- Enthusiasm for learning key technologies, architectures, processes, and best practices
Job Description
Company Overview
At Motorola Solutions, we believe that everything starts with our people. We’re a global close-knit community, united by the relentless pursuit to help keep people safer everywhere. Our critical communications, video security and command center technologies support public safety agencies and enterprises alike, enabling the coordination that’s critical for safer communities, safer schools, safer hospitals and safer businesses. Connect with a career that matters, and help us build a safer future.
Department Overview
The Emergency Call Management organization consists of Emergency Call Handling and Emergency Call Routing teams. Production systems running our products from both teams are public cloud-based solutions requiring 99.999% or greater availability. The Emergency Call Handling team is responsible for SaaS solutions that provide telecommunicators and supervisors intelligence, flexibility and mobility to help save lives. Trusted by thousands, answering over 65% of all 9-1-1 calls in the US, our call handling software offers PSAPs proven technology that increases productivity and continually strengthens how our customers coordinate response, and exchange life-saving information. The Emergency Call Routing team is responsible for SaaS solutions that provide geospatial and traditional call-routing capabilities to communities, regions, and states. These systems are ultra highly available, providing the service to route any caller dialing 9-1-1 (or 1-1-2, etc.) to the appropriate public safety answering point (PSAP) as quickly as possible. This position supports all of the Emergency Call Management organization.
Job Description
As a software engineer on the Emergency Call Management site reliability engineering (ECM-SRE) team you will join a team of talented software engineers who work directly with product and engineering teams to constantly improve reliability across our suite of public safety products. Your responsibilities will include:
- Architecture and implementation of Monitoring/Observability objectives. This includes maintenance of Alert response playbooks.
- Creation and reinforcement of the HA and reliability strategy.
- Triage of customer-reported incidents and problems to the proper software team, requiring troubleshooting and problem management skills.
- Maintenance and reporting of SLOs and error budget.
- Facilitation of Chaos Engineering activities with multiple engineering teams.
- Developing the SRE culture and sharing best practices across Motorola Solutions’ Emergency Call Management organization.
- On-call support alongside multiple engineering teams for products and services in production. This role focuses on Incident Command to maintain focus and direction of the incident process. This is essential to meet regulatory reporting requirements.
- Assist Motorola Solutions’ customer support teams in creating customer facing communication documents, requiring strong communication skills.
- Facilitation of Failure Mode and Effects Analysis with multiple engineering teams.
The right individual will have a passion for observability, reliability, automation, incident response, and enabling innovation.
Basic Requirements
- BS in Computer Engineering (or equivalent degree)
- 4+ years of professional software development
- Excellent communication skills
- Experience with Incident Response in Enterprise Environments
- Experience developing cloud-based applications (Azure/ AWS)
- Experience developing REST-based APIs and implementing microservice principles and architectures
- Experience with modern DevOps tooling (including CI/CD pipelines)
- Familiarity with the concepts involved in designing a high-availability architecture
- Familiarity with observability and monitoring
- Familiarity with automated testing
- Creativity and persistence when solving complex problems
- Enthusiasm for learning key technologies, architectures, processes, and best practices
Travel Requirements
None
Relocation Provided
None
Position Type
Experienced
Referral Payment Plan
No